R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning — ThinkLLM