A reinforcement learning technique that constrains model outputs to stay within a token budget, reducing response length while maintaining accuracy.