Reinforcement Learning from Rich Feedback with Distributional DAgger — ThinkLLM