Generative models can improve monocular 4D reconstruction by learning to correct artifacts and hallucinate missing geometry, then distilling results back into explicit 3D representations—enabling high-quality dynamic scene synthesis from single videos.
This paper presents a method to reconstruct dynamic 3D scenes from single-camera videos by using a generative model that learns to fix imperfections in initial reconstructions and fill in unseen regions.