Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation — ThinkLLM