LLMs appear to encode action decisions in their internal states before generating reasoning text, meaning their chain-of-thought may rationalize predetermined choices rather than drive them.
This paper investigates whether large language models decide on actions before or after reasoning through problems. Using linear probes and activation steering, the researchers show that tool-calling decisions are encoded in the model's internal activations before reasoning tokens are even generated, suggesting models may rationalize pre-made decisions rather than truly deliberating.