A method that ensures a model's reasoning process logically supports its final answer by adding consistency rewards during training.