Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

Nathaniel Bottman, Yinhong Liu, Kyle Richardson|June 11, 2026arXiv

Key Takeaway

Operadic consistency detects LLM reasoning failures by checking if a model's direct answer matches its step-by-step decomposed answer—a label-free signal that outperforms existing confidence measures across diverse models and datasets.

Summary

This paper introduces operadic consistency (OC), a method to detect when large language models fail at multi-step reasoning without needing correct answers. The key insight: a model's direct answer to a complex question should match the answer it gets by breaking down the question into steps and solving each one.

evaluation reasoning

Key Terms

self-consistency semantic-entropy multi-hop-reasoning selective-prediction chain-of-thought