Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon|April 27, 2026arXiv

Key Takeaway

Task description quality matters more than model size for reliable code generation—a small, fine-tuned classifier can detect problematic descriptions better than much larger models, and under-specification is the most critical defect type to watch for.

Summary

This paper introduces SpecValidator, a lightweight classifier that detects defects in task descriptions given to code-generating AI models. The tool identifies three types of problems—vague language, missing details, and formatting issues—and shows it's much better at catching these issues than larger models like GPT-4 mini or Claude.

evaluation applications data

Key Terms

parameter-efficient-fine-tuning code-generation task-specification defect-detection