WorldSample: Closed-loop Real-robot RL with World Modelling

Yuquan Xue, Le Xu, Zeyi Liu, Zhenyu Wu, Zhengyi Gu et al.|July 2, 2026arXiv

Key Takeaway

Using a world model trained on real robot data to generate synthetic transitions—combined with careful sample selection—lets robots learn manipulation tasks with 59% fewer real interactions while improving success rates by 28%.

Summary

WorldSample combines real robot interactions with a world model to generate synthetic training data for reinforcement learning. By closing a loop between physical rollouts, synthetic data generation, and policy improvement, it reduces the number of costly real-world interactions needed while maintaining high-quality learning.

training efficiency agents

Key Terms

world-model data-augmentation policy-gradient contact-rich-dynamics sample-selection