Agentic code generation can be dramatically improved by using verification feedback to guide iterative repair of both code and formal proofs, rather than trying to generate correct code in one shot.
AxDafny is a system that helps AI models generate verified code in Dafny by iteratively fixing code and proofs based on verification feedback. The authors created a benchmark of 250 programming problems and show their approach achieves 92.7% verification success, significantly outperforming previous methods.