L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models

Hariom Ingle, Ronit Ghode, Ishwari Gondkar, Jidnyasa Harad, Raviraj Joshi|June 23, 2026arXiv

Key Takeaway

Marathi NLP now has a gold-standard POS tagging dataset and baseline models, enabling better downstream tasks like machine translation and parsing for an under-resourced language with 83+ million speakers.

Summary

This paper introduces L3Cube-MahaPOS, a manually annotated dataset of 32,354 Marathi sentences for part-of-speech tagging, along with benchmarks across multiple model architectures.

data evaluation

Key Terms

part-of-speech-tagging universal-dependencies code-mixing devanagari morphology