RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

Chunru Lin, Hongxin Zhang, Fenghao Yu, Zhehuan Chen, Thomas L. Griffiths et al.|May 28, 2026arXiv

Key Takeaway

Today's robot models are brittle: they work on familiar tasks after fine-tuning but completely fail when the environment changes slightly, showing they memorize solutions rather than learning to reason and adapt.

Summary

RoboWits is a robotic benchmark that tests whether robots can reason creatively and adapt to unexpected challenges—not just execute pre-learned skills. The researchers built an automated pipeline to generate 208 diverse manipulation tasks with varying difficulty, then tested popular robot policies and vision-language models.

reasoning agents evaluation

Key Terms

vision-language-action-model tool-use robustness creative-utility task-mutation