Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev et al.|May 5, 2026arXiv

Key Takeaway

Feed-forward 3D reconstruction from multi-view images can match or exceed optimization-based methods while being much faster, and UV-parameterization lets you train with many high-resolution views without memory explosion.

Summary

HeadsUp reconstructs detailed 3D head models from multiple camera views using an efficient neural network that compresses images into a compact representation, then decodes them into 3D Gaussians (mathematical shapes). The method scales to thousands of subjects and works on new people without extra optimization, enabling applications like generating new identities and animating expressions.

architecture multimodal

Key Terms

3d-scene-reconstruction encoder-decoder latent-representation feed-forward-transformer gaussian-splatting