Self-Trained Verification for Training- and Test-Time Self-Improvement

Chen Henry Wu, Aditi Raghunathan|May 28, 2026arXiv

Key Takeaway

Training verifiers with access to correct answers creates a supervision signal that unlocks both test-time refinement and training-time self-improvement—two previously bottlenecked approaches to scaling reasoning models.

Summary

This paper tackles a key bottleneck in AI reasoning: building verifiers that can catch errors in model-generated solutions. The authors propose self-trained verification (STV), which trains verifiers by showing them reference solutions so they learn to spot mistakes.

reasoning training evaluation

Key Terms

verification-refinement self-training verifier pass-at-k reinforcement-learning-from-verifiable-rewards