Public Benchmark Feed
Gemini vs ClaudeLive Scorecard
Real benchmark runs from the ekkOS proxy stack. We track long-horizon constraint recall, task completion, tool correctness, latency, and cost on the same workload suite.
Top Model
Waiting for data
No benchmark rows yet
--
Runs (72h)
--
Calculating...
Global Pass Rate
--
Weighted by run volume
Feed Status
Refreshing...
Waiting for first sync
source: unknown
Live proxy rows use derived quality metrics unless benchmark overrides are provided.
Provider Standings
window: 72h
Model Leaderboard
auto-refresh 15s
| Rank | Model | Score | Pass | Recall | Tool | Latency | Cost | Runs |
|---|
Recent Runs
most recent first