UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning — ThinkLLM