Task description quality matters more than model size for reliable code generation—a small, fine-tuned classifier can detect problematic descriptions better than much larger models, and under-specification is the most critical defect type to watch for.
This paper introduces SpecValidator, a lightweight classifier that detects defects in task descriptions given to code-generating AI models. The tool identifies three types of problems—vague language, missing details, and formatting issues—and shows it's much better at catching these issues than larger models like GPT-4 mini or Claude.