Looped transformers need recall mechanisms combined with outer normalization to reliably generalize to harder problems; without these, they memorize training solutions and fail at test time.
This paper analyzes looped transformers—models that iterate multiple times at test time to solve harder problems—by studying when they generalize versus memorize.