World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

Liyuan Zhu, Shengyu Huang, Amrita Mazumdar, Tianye Li, Zan Gojcic et al.|July 1, 2026arXiv

Key Takeaway

Generative models can improve monocular 4D reconstruction by learning to correct artifacts and hallucinate missing geometry, then distilling results back into explicit 3D representations—enabling high-quality dynamic scene synthesis from single videos.

Summary

This paper presents a method to reconstruct dynamic 3D scenes from single-camera videos by using a generative model that learns to fix imperfections in initial reconstructions and fill in unseen regions.

Key Terms

3d-gaussian-splatting monocular-reconstruction novel-view-synthesis distillation