Context-Aware RL for Agentic and Multimodal LLMs — ThinkLLM