Using 3D reconstruction as an anchor to guide video generation creates better 3D consistency than generating videos alone, and you can do this by reusing existing video models without task-specific training.
OrbitForge converts text descriptions into 3D scenes by leveraging frozen video generation models and Gaussian Splatting reconstruction. It generates a video from text, identifies missing viewpoints around a complete orbit, fills those gaps with the video model, and reconstructs everything into a consistent 3D scene—all without fine-tuning or slow step-by-step generation.