Predicting sparse point trajectories instead of dense pixels makes future scene simulation orders of magnitude faster while maintaining accuracy—enabling practical exploration of many possible futures with uncertainty quantification.
This paper predicts how scenes will evolve by tracking sparse point trajectories instead of predicting dense pixel values. An autoregressive diffusion model generates thousands of plausible futures from a single image while explicitly modeling uncertainty growth over time, achieving faster simulation than dense video prediction methods.