A decoding strategy that generates N candidate responses and selects the one ranked highest by a reward model.