Current LLMs struggle with policy understanding—they're better at applying knowledge to real problems than recalling facts or reasoning about concepts—and specialized models with domain-aligned experts can help close this gap.
This paper introduces PolicyBench, a 21K-case benchmark for evaluating how well large language models understand public policy across the US and China. It also proposes PolicyMoE, a specialized model using mixture-of-experts to improve policy comprehension at three levels: memorizing facts, understanding concepts, and applying knowledge to real scenarios.