TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis — ThinkLLM