Marathi NLP now has a gold-standard POS tagging dataset and baseline models, enabling better downstream tasks like machine translation and parsing for an under-resourced language with 83+ million speakers.
This paper introduces L3Cube-MahaPOS, a manually annotated dataset of 32,354 Marathi sentences for part-of-speech tagging, along with benchmarks across multiple model architectures.