Agents trained with separate optimization channels for accuracy and tool efficiency learn to use external tools strategically rather than reflexively, reducing latency and noise while maintaining or improving performance.
This paper addresses a critical problem in AI agents: they overuse external tools even when they could solve problems using their own knowledge. The authors propose HDPO, a training method that teaches agents to be smarter about when to use tools by separating the goal of accuracy from the goal of efficiency, rather than combining them into one conflicting objective.