PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning — ThinkLLM