PhyCo: Learning Controllable Physical Priors for Generative Motion

Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker|April 30, 2026arXiv

Key Takeaway

You can make generative video models physically consistent by combining physics-labeled training data, ControlNet conditioning on physical properties, and VLM-based reward signals—no simulator needed at runtime.

Summary

PhyCo teaches video generation models to respect physics by fine-tuning them on 100K+ realistic simulation videos with varying physical properties (friction, bouncing, deformation), then using a vision-language model to provide physics-aware feedback during generation. This lets models create videos where objects behave realistically without needing a physics simulator at inference time.

training multimodal evaluation

Key Terms

diffusion-process physics-informed controlnet reward-optimization vision-language-model