Gaussian GRPO enables more stable and fair training of multimodal models across diverse tasks by forcing reward distributions to follow a standard normal curve, solving the problem of extreme variance when training on different visual reasoning benchmarks.
OpenVLThinkerV2 is a multimodal AI model that combines vision and language understanding for complex visual reasoning tasks.