Multimodal AI models are unreliably sensitive to input order—a property that should be baseline for production systems. Simple prompt fixes don't solve this; the problem likely requires changes during model training or design.
This paper audits 18 multimodal AI models to check if they give consistent answers when information is presented in different orders. The researchers found that all models fail this basic reliability test, with 24-50% of answers changing based on order.