You don't need to update all transformer layers during RL training—focusing on middle layers can match full-model performance while dramatically reducing compute and memory costs.
This paper reveals that training just a single transformer layer during RL fine-tuning can recover most or all of the performance gains from updating the entire model. The authors find that RL improvements concentrate in middle layers, with input and output layers contributing far less, and this pattern holds consistently across different models, algorithms, and tasks.