Current multimodal AI models have a fundamental weakness in multi-view spatial reasoning—they can't reliably track objects across different camera angles, and this limitation can't be fixed by better prompting strategies alone.
TriViewBench is a controlled benchmark for testing how well multimodal AI models handle complex visual reasoning across multiple views of 3D scenes.