Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay

Mariah Al Giptiah Binte Yusoff, Jakin Tan, Bocheng Chen, Guangliang Liu, Xi Chen|May 27, 2026arXiv

Key Takeaway

Current LLMs have significant gaps in understanding discourse particles in Southeast Asian languages like Malay, but structured linguistic scaffolding can substantially improve their pragmatic competence.

Summary

This paper introduces MalayPrag, a benchmark for testing how well large language models understand discourse particles (words like 'well' or 'kind of' that convey emotion and intent) in colloquial Malay.

evaluation

Key Terms

pragmatics discourse-particles linguistic-scaffolding