Per-language fine-tuning with synthetic data augmentation and threshold tuning can significantly improve multilingual NLP tasks, but model generalization to test data varies dramatically—some architectures dropped 30-50% in performance despite strong development results.
This paper describes a system for detecting polarized language across 22 languages using fine-tuned Gemma models with synthetic data augmentation. The approach combines per-language model tuning, LLM-generated synthetic training data with quality filtering, and weighted ensemble predictions to achieve competitive performance on a multilingual classification task.