Attractor Models make iterative refinement practical by using implicit differentiation to solve fixed points, enabling smaller models (27M-770M parameters) to outperform much larger ones on reasoning and language tasks without the training instability of traditional recurrent architectures.
This paper introduces Attractor Models, which improve on looped Transformers by using implicit differentiation to solve for fixed points in latent representations.