LLMs can detect their own reasoning failures through self-questioning, but this diagnostic ability doesn't translate to self-correction—a critical gap for building more reliable AI systems.
This paper investigates how LLMs can diagnose their own reasoning errors by asking questions during problem-solving. Researchers train a probe to detect when a model is uncertain by analyzing its hidden states before and after question generation, then use this signal to decide when to intervene. The key finding: models can identify when they're wrong, but struggle to actually fix their mistakes.