SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe|April 17, 2026arXiv

Key Takeaway

Large language models with dynamic few-shot prompting can effectively judge word sense plausibility in stories, and combining multiple models improves results closer to human agreement patterns.

Summary

This paper tackles a new task where AI models predict how plausible a word meaning is within a story. The researchers test both small fine-tuned models and large models with few-shot prompting, finding that large models with dynamic examples best match human judgments of word sense plausibility in narratives.

evaluation reasoning applications

Key Terms

few-shot-learning word-sense-disambiguation ensemble-methods prompt-engineering fine-tune