Combining supervision from multiple generated sequences (rollouts) to create more stable training signals.