Using an independent verifier to validate problem correctness prevents reward hacking in AI-generated math problems, enabling better training data creation without human experts.
This paper tackles the problem of generating valid and challenging math problems for training AI models. Instead of relying on humans or simple self-play (which often produces invalid problems), the authors introduce VHG, a system with three players: a problem setter, a solver, and an independent verifier.