MLLMs can be trained to adaptively switch between reasoning and code execution for numerical tasks—not just visual ones—using reinforcement learning with a specialized reward function that guides when to invoke tools.
This paper enhances multimodal AI models to reason through complex math and numerical problems by interleaving natural language thinking with executable code. The authors use reinforcement learning to train models to decide when and how to use code tools, achieving 6.1% accuracy improvement on benchmarks.