General Preference Reinforcement Learning — ThinkLLM