Tool schema injection is a hidden operational cost in agent systems—Tool Attention solves this by filtering irrelevant tools and deferring full schema loading, reducing per-turn tokens from ~47k to ~2.4k without sacrificing capability.
This paper introduces Tool Attention, a middleware system that dramatically reduces the token overhead from injecting tool schemas into LLM agents. By using smart filtering (based on task intent and access rules) and lazy loading of full schemas only when needed, it cuts tool-related tokens by 95% in multi-tool deployments, making agentic workflows more efficient and cost-effective.