You can boost long-context reasoning without retraining by identifying relevant evidence through attention patterns and replaying it before generation—a simple inference-time trick that works across different model sizes.
ReContext improves how LLMs use information in long documents by replaying relevant evidence before generating answers. Instead of training or pruning context, it uses the model's internal attention signals to identify and reorder important passages, helping the model focus on what matters for each question.