Data referencing errors are a widespread problem in LLM table reasoning that goes beyond final-answer accuracy; using a lightweight critic model to catch these errors during inference significantly improves reliability.
LLMs make data referencing errors when reading tables—citing wrong values or missing data despite understanding table structure. This paper systematically measures these errors across models and shows that using a critic model to detect and filter bad outputs improves accuracy by up to 12%, even with a small 4B-parameter critic.