SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla|April 14, 2026arXiv

Key Takeaway

Symbolic rule-based evaluation of 3D scenes is more reliable and interpretable than vision-language model judges, and text-only LLMs can outperform vision models at refining spatial layouts when given explicit constraint feedback.

Summary

SceneCritic is a symbolic evaluator that assesses 3D indoor scene layouts by checking semantic, orientation, and geometric consistency against a structured spatial ontology built from real-world scene data.

evaluation multimodal reasoning

Key Terms

vision-language-models scene-graph spatial-ontology iterative-refinement semantic-coherence