Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Yu Xia, Zhouhang Xie, Xin Xu, Byungkyu Kang, Prarit Lamba et al.|June 2, 2026arXiv

Key Takeaway

You can steer LLM reasoning in real-time by treating it as a control problem: a separate agent learns to guide the main model's thinking steps, saving tokens while maintaining accuracy and letting you trade off speed vs. quality.

Summary

This paper introduces ACTS, a method that uses a controller agent to guide how a language model reasons during inference. Instead of letting the model think freely, the controller observes the reasoning progress and remaining token budget, then suggests what strategy to use next—enabling efficient reasoning with explicit control over the thinking process.

reasoning efficiency agents

Key Terms

chain-of-thought markov-decision-process inference-time-compute token-budget reinforcement-learning