You can make reasoning models 15-60% more token-efficient while keeping or improving accuracy by simply training them to solve multiple problems simultaneously, creating an implicit efficiency incentive rather than explicit penalties.
This paper introduces Batched Contextual Reinforcement (BCR), a training method that makes language models reason more efficiently by training them to solve multiple problems at once in a shared context.