Public Benchmark Feed
Gemini vs Claude Live Scorecard
Real benchmark rows from the ekkOS proxy stack, tracking long-horizon recall, task completion, tool correctness, latency, and cost on the same workload suite.
Top model
Waiting for data
No benchmark rows yet
--
Runs (72h)
--
Calculating...
Global pass rate
--
Weighted by run volume
Feed status
Refreshing...
Waiting for first sync
source: unknown
Live proxy rows use derived quality metrics unless benchmark overrides are provided.
Provider Standings
window: 72h
Model Leaderboard
auto-refresh 15s
| Rank | Model | Score | Pass | Recall | Tool | Latency | Cost | Runs |
|---|
Recent Runs
most recent first