Pretraining Recurrent Networks without Recurrence — ThinkLLM