Type Checker Timing Benchmarks

Daily performance analysis of Python type checkers

Measuring wall-clock execution time and peak memory usage when type checking popular open-source Python packages

Loading...

Run Summary

-
Packages Tested
-
Type Checkers

Type Checker Comparison

Average Execution Time (s)

Lower is better — mean time to type check each package

Average Peak Memory (MB)

Lower is better — mean peak RSS during type checking

P90 Execution Time (s)

90th percentile type checking time across packages

P95 Execution Time (s)

95th percentile type checking time across packages

Top 10 Slowest Packages (s)

Packages with the highest average execution time across checkers

Top 10 Highest Memory Packages (MB)

Packages with the highest average peak memory across checkers

Detailed Results

Package Type Checker Time (s) Memory (MB) Status
Loading results...

Methodology

What We Measure

We benchmark the full type checking process for each type checker running against real-world Python packages. This measures wall-clock execution time and peak RSS (resident set size) memory.

Test Process

  1. Shallow-clone each package from GitHub
  2. Install package dependencies per install_envs.json
  3. Run each type checker with 1 warmup run (discarded) + 5 measured runs, with a 5-minute timeout per run
  4. Record wall-clock time and peak memory usage (mean of 5 measured runs)

Timeout: Each type checker has a 5-minute timeout. Timeouts and OOM kills are recorded as failures.

Memory: On Linux, peak memory is tracked via /proc/{pid}/status (VmHWM). On macOS, getrusage is used.

Dependencies & Check Paths

Each package's environment is configured in install_envs.json:

  • install: Whether to pip install -e . the package itself
  • deps: Additional pip packages to install
  • install_env: Environment variables for installation
  • check_paths: Subdirectories to type check (e.g. ["src"]). When omitted, the entire package root is checked.

Only packages with install: true or a non-empty deps list are benchmarked.

Configuration Overrides

Packages often ship their own type checker configs that can skew benchmark comparisons. We do our best to run each checker in its default setting with as neutral a configuration as possible. To do this, we generate a minimal config file for each checker that embeds the check_paths in its native format and pass it via CLI flags to override any package-level config:

  • Pyright: pyrightconfig.json with "include": [paths] written in-place
  • Mypy: [mypy] with files = paths and check_untyped_defs = True via --config-file
  • ty: [src] include = [paths] in ty.benchmark.toml via --config-file
  • Pyrefly: project_includes = [paths] in pyrefly.benchmark.toml via --config
  • Zuban: [mypy] with files = paths via --config-file

This means every checker sees exactly the same target paths and a neutral configuration, regardless of what the package ships.

Mypy note: By default, mypy skips the bodies of functions that lack type annotations. The other four checkers all analyze unannotated code. We enable check_untyped_defs = True so that mypy checks the same amount of code as the other tools, making the comparison fair.

Metrics Explained

  • Time (s): Wall-clock execution time in seconds
  • Memory (MB): Peak resident set size in megabytes
  • P50/P95: 50th/95th percentile across packages
  • Status: Whether the checker completed successfully

Type Checkers

Pyright Type checker for Python by Microsoft
Pyrefly Type checker for Python by Meta
ty Type checker for Python by Astral
Mypy Static type checker for Python
Zuban Type checker for Python by the creator of Jedi LSP