You can make generative video models physically consistent by combining physics-labeled training data, ControlNet conditioning on physical properties, and VLM-based reward signals—no simulator needed at runtime.
PhyCo teaches video generation models to respect physics by fine-tuning them on 100K+ realistic simulation videos with varying physical properties (friction, bouncing, deformation), then using a vision-language model to provide physics-aware feedback during generation. This lets models create videos where objects behave realistically without needing a physics simulator at inference time.