Video generation systems lose detail because their decoders ignore the input image—adding reference conditioning to the decoder recovers this information and improves quality by up to 2.1dB PSNR.
RefDecoder improves video generation by conditioning the decoder on a reference image, fixing a common architectural flaw where decoders ignore input details. By injecting reference image information through attention mechanisms during decoding, it preserves fine details and consistency without requiring retraining of existing systems.