Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs

Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri|April 2, 2026arXiv

Key Takeaway

Reasoning models can be made safer by detecting when they've misunderstood the question itself—reconstruct what question they answered from their reasoning trace, and abstain if it differs from the original.

Summary

This paper tackles a critical problem: getting LLMs to know when to refuse answering questions. The authors discovered that reasoning models often fail at abstention (refusing to answer) because they answer the wrong question rather than answering incorrectly.

reasoning safety evaluation

Key Terms

abstention reasoning-trace hallucination reasoning-model