Today's robot models are brittle: they work on familiar tasks after fine-tuning but completely fail when the environment changes slightly, showing they memorize solutions rather than learning to reason and adapt.
RoboWits is a robotic benchmark that tests whether robots can reason creatively and adapt to unexpected challenges—not just execute pre-learned skills. The researchers built an automated pipeline to generate 208 diverse manipulation tasks with varying difficulty, then tested popular robot policies and vision-language models.