Randomized YaRN Improves Length Generalization for Long-Context Reasoning

Manas Mehta, Fangcong Yin, Greg Durrett|June 22, 2026arXiv

Key Takeaway

Training models with randomized positional encodings from a larger range than your data helps them generalize to much longer sequences without requiring long-context training data.

Summary

This paper proposes Randomized YaRN, a training method that helps language models generalize to much longer text sequences than they were trained on. By randomly sampling positional encodings from a larger range during training on short sequences, the model learns to handle longer contexts it hasn't seen before—improving performance on reasoning tasks with 16K-128K token contexts.

training

Key Terms

positional-encoding length-generalization yarn out-of-distribution curriculum-learning