ekkOS Labs // research division

Gemini vs Claude Live Scorecard

Real benchmark rows from the ekkOS proxy stack, tracking long-horizon recall, task completion, tool correctness, latency, and cost on the same workload suite.

Top model

Waiting for data

No benchmark rows yet

--

Runs (72h)

--

Calculating...

Global pass rate

--

Weighted by run volume

Feed status

Refreshing...

Waiting for first sync

source: unknown

Live proxy rows use derived quality metrics unless benchmark overrides are provided.

Provider Standings

window: 72h

Model Leaderboard

auto-refresh 15s

RankModelScorePassRecallToolLatencyCostRuns

Recent Runs

most recent first