ORCA calibrates LLM reasoning in real-time by adapting confidence estimates per input, enabling 40-67% compute savings during inference while providing mathematical guarantees on error rates across different reasoning tasks and domains.
This paper introduces ORCA, a framework that makes language models more efficient during reasoning by calibrating their sampling process. Using test-time training and conformal prediction, ORCA learns to estimate confidence in its own reasoning steps, reducing wasted computation while maintaining accuracy—saving up to 47% compute on in-distribution tasks and 67% on out-of-distribution problems.