Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Jiazheng Xing, Hangjie Yuan, Lingling Cai, Xinyu Liu, Yujie Wei et al.|May 29, 2026arXiv

Key Takeaway

By separating training (lightweight generator) from inference (high-capacity generator), you can build reasoning-driven video models that produce cinema-quality results without prohibitive training costs.

Summary

Lumos-Nexus is a video generation system that combines reasoning capabilities with high visual quality by using a lightweight generator during training and progressively handing off to a powerful generator at inference time. This two-stage approach lets models understand user intent and generate coherent videos without the computational cost of training with large generators.

multimodal efficiency architecture

Key Terms

coarse-to-fine-training latent-space reasoning-driven-generation unified-model