Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Xuan Qi|April 2, 2026arXiv

Key Takeaway

More reasoning isn't always better for function-calling agents—brief, structured reasoning (8-32 tokens) optimally helps the model select the right function, while longer reasoning causes it to hallucinate invalid functions.

Summary

This paper studies how much reasoning time language agents should spend before calling functions. Testing on 200 tasks, the authors find a surprising non-monotonic pattern: brief 32-token reasoning improves accuracy by 45%, but longer reasoning actually hurts performance.

agents reasoning efficiency

Key Terms

chain-of-thought function-calling token-budget hallucination reasoning-agent