Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu|April 2, 2026arXiv

Key Takeaway

You can make reasoning models 15-60% more token-efficient while keeping or improving accuracy by simply training them to solve multiple problems simultaneously, creating an implicit efficiency incentive rather than explicit penalties.

Summary

This paper introduces Batched Contextual Reinforcement (BCR), a training method that makes language models reason more efficiently by training them to solve multiple problems at once in a shared context.

training efficiency reasoning

Key Terms

chain-of-thought inference-cost token-budget scaling-laws implicit-constraint