VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Yen-Jen Wang, Jiaman Li, Sirui Chen, Takara E. Truong, Pei Xu et al.|June 29, 2026arXiv

Key Takeaway

Synthetic data from reconstructed 3D scenes can effectively train perception-based humanoid robots for real-world loco-manipulation, eliminating the need for expensive human-annotated robot trajectories.

Summary

This paper solves a key bottleneck in training humanoid robots: the lack of paired data combining egocentric camera views, language instructions, and robot motion. The authors generate 48,000 synthetic training examples by reconstructing real indoor scenes with 3D Gaussian Splatting, simulating robot trajectories, and rendering first-person views.

data applications

Key Terms

3d-gaussian-splatting egocentric-perception loco-manipulation sim-to-real-transfer whole-body-control