Domain-specific datasets and targeted fine-tuning can significantly improve LLM performance on specialized tasks like classical poetry understanding, showing that treating niche domains as general problems leaves performance on the table.
This paper creates a specialized dataset of 49,404 instruction-response pairs for classical Chinese poetry tasks and fine-tunes Qwen2.5-14B using LoRA to build PoetryQwen. The model breaks poetry understanding into three subtasks—term interpretation, semantic interpretation, and emotional inference—and achieves 9.7% improvement over the baseline on a poetry evaluation benchmark.