Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao|May 7, 2026arXiv

Key Takeaway

Using an independent verifier to validate problem correctness prevents reward hacking in AI-generated math problems, enabling better training data creation without human experts.

Summary

This paper tackles the problem of generating valid and challenging math problems for training AI models. Instead of relying on humans or simple self-play (which often produces invalid problems), the authors introduce VHG, a system with three players: a problem setter, a solver, and an independent verifier.

training reasoning data

Key Terms

reward-hackable self-play verifier problem-generation