Cross-lingual transfer and unsupervised clustering are complementary for morphology discovery in low-resource languages—transfer finds cognates while clustering spots language-specific innovations that transfer misses.
This paper develops a method to automatically discover morphological patterns in Giriama, a low-resource Bantu language with minimal labeled data. By combining knowledge transfer from Swahili with unsupervised clustering, the system identifies noun classes and uncovers two previously unknown morphological patterns, achieving 86.7% accuracy on lemmatization across word classes.