RL post-training of multimodal models may improve performance through learned hallucination patterns rather than genuine visual reasoning, challenging assumptions about how these models actually learn from images.
This paper investigates how reinforcement learning improves multimodal AI models' visual reasoning by studying the role of hallucination—when models generate plausible-sounding but incorrect information.