Run Summary
Type Checker Comparison
Average Execution Time (s)
Lower is better — mean time to type check each package
Average Peak Memory (MB)
Lower is better — mean peak RSS during type checking
P90 Execution Time (s)
90th percentile type checking time across packages
P95 Execution Time (s)
95th percentile type checking time across packages
Top 10 Slowest Packages (s)
Packages with the highest average execution time across checkers
Top 10 Highest Memory Packages (MB)
Packages with the highest average peak memory across checkers
Detailed Results
| Package | Type Checker | Time (s) | Memory (MB) | Status |
|---|---|---|---|---|
| Loading results... | ||||
Methodology
What We Measure
We benchmark the full type checking process for each type checker running against real-world Python packages. This measures wall-clock execution time and peak RSS (resident set size) memory.
Test Process
- Shallow-clone each package from GitHub
- Install package dependencies per
install_envs.json - Run each type checker with 1 warmup run (discarded) + 5 measured runs, with a 5-minute timeout per run
- Record wall-clock time and peak memory usage (mean of 5 measured runs)
Timeout: Each type checker has a 5-minute timeout. Timeouts and OOM kills are recorded as failures.
Memory: On Linux, peak memory is tracked via
/proc/{pid}/status (VmHWM). On macOS,
getrusage is used.
Dependencies & Check Paths
Each package's environment is configured in
install_envs.json:
- install: Whether to
pip install -e .the package itself - deps: Additional pip packages to install
- install_env: Environment variables for installation
- check_paths: Subdirectories to type check
(e.g.
["src"]). When omitted, the entire package root is checked.
Only packages with install: true or a non-empty
deps list are benchmarked.
Configuration Overrides
Packages often ship their own type checker configs that can skew
benchmark comparisons. We do our best to run each checker in its
default setting with as neutral a configuration as possible. To
do this, we generate a minimal config file for each checker that
embeds the check_paths in its native format and pass
it via CLI flags to override any package-level config:
- Pyright:
pyrightconfig.jsonwith"include": [paths]written in-place - Mypy:
[mypy]withfiles = pathsandcheck_untyped_defs = Truevia--config-file - ty:
[src] include = [paths]inty.benchmark.tomlvia--config-file - Pyrefly:
project_includes = [paths]inpyrefly.benchmark.tomlvia--config - Zuban:
[mypy]withfiles = pathsvia--config-file
This means every checker sees exactly the same target paths and a neutral configuration, regardless of what the package ships.
Mypy note: By default, mypy skips the bodies of
functions that lack type annotations. The other four checkers all
analyze unannotated code. We enable check_untyped_defs = True
so that mypy checks the same amount of code as the other tools,
making the comparison fair.
Metrics Explained
- Time (s): Wall-clock execution time in seconds
- Memory (MB): Peak resident set size in megabytes
- P50/P95: 50th/95th percentile across packages
- Status: Whether the checker completed successfully