Run Summary
📈 LSP Comparison
Average Latency (ms)
Lower is better - time to resolve "Go to Definition"
OK Rate (%)
Higher is better - requests that completed without timeout/error
Success Rate (%)
Higher is better - valid definitions found
Latency Distribution
Distribution of mean latency across all packages (lower is better). Triangles show the median P95 latency.
📋 Detailed Results
| Package | LSP | Avg Latency | P50 Latency | P95 Latency | OK % | Success % |
|---|---|---|---|---|---|---|
| Loading results... | ||||||
📐 Methodology
🎯 What We Measure
We benchmark the textDocument/definition LSP request (Go to Definition)
across different Python language servers. This is one of the most commonly used
IDE features for code navigation.
🔄 Test Process
- Clone the top Python packages from GitHub
- Pre-pick all random Python files and identifier positions for the run
- For each language server, start a single LSP server per repository
- Send all "Go to Definition" requests to the same server instance
- Measure latency and verify if returned locations are valid
⏱️ Timeout: Requests have a 2-second timeout. Timeouts are counted as failures and excluded from latency statistics.
🔁 Server Lifecycle: One LSP server is started per language server per repository. All test cases for that repo are sent to the same server, reflecting real-world IDE usage where a single server handles multiple requests. All type checkers receive identical test cases for fair comparison.
🔥 Warmup: A 30-second warmup period is applied after server
initialization to give the language server time to index before requests
are sent. This can be configured via --warmup.
📊 Metrics Explained
- Latency: Time (ms) from request to response (excludes timeouts)
- P50/P95: 50th/95th percentile latencies
- OK Rate: Percentage of requests that completed without timeout or error (indicates reliability)
- Success Rate: Percentage of requests that returned a valid definition location pointing to a real file (indicates accuracy)