To build better video editing systems, you need specialized evaluation tools—generic vision-language models don't understand editing quality the way humans do.
This paper introduces VEFX-Bench, a comprehensive dataset and evaluation system for video editing. It includes 5,049 human-annotated video editing examples across multiple categories, a specialized reward model (VEFX-Reward) that judges editing quality across three dimensions, and a 300-video benchmark for comparing editing systems.