Type Checker Timing Benchmark | Python Type Coverage

Run Summary

-

Packages Tested

-

Type Checkers

Type Checker Comparison

Average Execution Time (s)

Lower is better — mean time to type check each package

Average Peak Memory (MB)

Lower is better — mean peak RSS during type checking

P90 Execution Time (s)

90th percentile type checking time across packages

P95 Execution Time (s)

95th percentile type checking time across packages

Top 10 Slowest Packages (s)

Packages with the highest average execution time across checkers

Top 10 Highest Memory Packages (MB)

Packages with the highest average peak memory across checkers

Detailed Results

Package	Type Checker	Time (s)	Memory (MB)	Status
Loading results...

Methodology

What We Measure

We benchmark the full type checking process for each type checker running against real-world Python packages. This measures wall-clock execution time and peak RSS (resident set size) memory.

Test Process

Shallow-clone each package from GitHub
Install package dependencies per install_envs.json
Run each type checker with 1 warmup run (discarded) + 5 measured runs, with a 5-minute timeout per run
Record wall-clock time and peak memory usage (mean of 5 measured runs)

Timeout: Each type checker has a 5-minute timeout. Timeouts and OOM kills are recorded as failures.

Memory: On Linux, peak memory is tracked via /proc/{pid}/status (VmHWM). On macOS, getrusage is used.

Dependencies & Check Paths

Each package's environment is configured in install_envs.json:

install: Whether to pip install -e . the package itself
deps: Additional pip packages to install
install_env: Environment variables for installation
check_paths: Subdirectories to type check (e.g. ["src"]). When omitted, the entire package root is checked.

Only packages with install: true or a non-empty deps list are benchmarked.

Configuration Overrides

Packages often ship their own type checker configs that can skew benchmark comparisons. We do our best to run each checker in its default setting with as neutral a configuration as possible. To do this, we generate a minimal config file for each checker that embeds the check_paths in its native format and pass it via CLI flags to override any package-level config:

Pyright: pyrightconfig.json with "include": [paths] written in-place
Mypy: [mypy] with files = paths and check_untyped_defs = True via --config-file
ty: [src] include = [paths] in ty.benchmark.toml via --config-file
Pyrefly: project_includes = [paths] in pyrefly.benchmark.toml via --config
Zuban: [mypy] with files = paths via --config-file

This means every checker sees exactly the same target paths and a neutral configuration, regardless of what the package ships.

Mypy note: By default, mypy skips the bodies of functions that lack type annotations. The other four checkers all analyze unannotated code. We enable check_untyped_defs = True so that mypy checks the same amount of code as the other tools, making the comparison fair.

Metrics Explained

Time (s): Wall-clock execution time in seconds
Memory (MB): Peak resident set size in megabytes
P50/P95: 50th/95th percentile across packages
Status: Whether the checker completed successfully

Type Checkers

Pyright Type checker for Python by Microsoft

Pyrefly Type checker for Python by Meta

ty Type checker for Python by Astral

Mypy Static type checker for Python

Zuban Type checker for Python by the creator of Jedi LSP

Type Checker Timing Benchmarks

Run Summary

Failures

Type Checker Comparison

Average Execution Time (s)

Average Peak Memory (MB)

P90 Execution Time (s)

P95 Execution Time (s)

Top 10 Slowest Packages (s)

Top 10 Highest Memory Packages (MB)

Detailed Results

Methodology

What We Measure

Test Process

Dependencies & Check Paths

Configuration Overrides

Metrics Explained

Type Checkers