LLM-based code repair tools work better than traditional approaches for logical vulnerabilities, but they fail frequently due to sensitivity to how you phrase the request and difficulty understanding the full code context around the bug.
This paper introduces LogicEval, a framework for testing how well automated repair tools—including AI models—can fix logical vulnerabilities in real software. The authors created LogicDS, a dataset of 86 real security bugs with CVE numbers, and found that current repair techniques struggle mainly because of prompt sensitivity and loss of code context.