Training a model to predict human preferences so it can score outputs and guide AI training through reinforcement learning.