Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations — ThinkLLM