When Can LLMs Learn to Reason with Weak Supervision? — ThinkLLM