Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients — ThinkLLM