Large language models with dynamic few-shot prompting can effectively judge word sense plausibility in stories, and combining multiple models improves results closer to human agreement patterns.
This paper tackles a new task where AI models predict how plausible a word meaning is within a story. The researchers test both small fine-tuned models and large models with few-shot prompting, finding that large models with dynamic examples best match human judgments of word sense plausibility in narratives.