LCR

Long Context Retrieval

long contextScore: 0-100 (% accuracy)7 models scored

About

Tests models on retrieving specific information from very long documents, measuring long-context comprehension and retrieval accuracy

Methodology

Models must locate and extract specific facts, figures, or passages from long documents (10K–1M tokens). Tests robustness of attention mechanisms and context utilisation at extended lengths.

Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	Claude Opus 4.7	70.3%
2	o3	69.3%
3	Grok 4	68.0%
4	Claude Sonnet 4.5	66.0%
5	Gemini 2.5 Pro	66.0%
6	Claude Opus 4.5	65.3%
7	Claude Sonnet 4