A smaller speculative model can predict an agentic system's tool-calling trajectory, enabling parallel execution and early termination of expensive operations—delivering significant speedups without accuracy loss.
SpecEyes speeds up agentic multimodal AI systems by using a lightweight model to predict what tools the main model will need, allowing expensive operations to be skipped or run in parallel. This cuts latency by 1.1-3.35x while maintaining accuracy, solving a key bottleneck in systems like OpenAI o3 that repeatedly invoke vision tools.