A reward signal based on how indistinguishable a generated response is from real human responses, using an LLM judge.