You can build smaller, more interpretable anomaly detection systems by fine-tuning vision-language models on curated datasets with natural-language explanations rather than relying on large general-purpose models.
This paper creates VisAnomBench, a curated dataset of time-series anomalies with AI-generated explanations, and uses it to fine-tune VisAnomReasoner—a lightweight vision-language model for detecting and explaining unusual patterns in sequential data. The approach achieves significant improvements over existing methods while remaining parameter-efficient.