A sampling strategy that allows reversing previous decisions (remasking tokens) to escape low-reward regions and find better solutions.