AIR: Adaptive Interleaved Reasoning with Code in MLLMs

Cong Han, Xiaohan Lan, Haibo Qiu, Yujie Zhong|June 22, 2026arXiv

Key Takeaway

MLLMs can be trained to adaptively switch between reasoning and code execution for numerical tasks—not just visual ones—using reinforcement learning with a specialized reward function that guides when to invoke tools.

Summary

This paper enhances multimodal AI models to reason through complex math and numerical problems by interleaving natural language thinking with executable code. The authors use reinforcement learning to train models to decide when and how to use code tools, achieving 6.1% accuracy improvement on benchmarks.

reasoning training multimodal

Key Terms

interleaved-reasoning tool-invocation reinforcement-learning-from-internal-feedback multimodal-large-language-model