A technique where the model learns to predict hidden or blanked-out words in text, allowing it to reason about context from multiple directions at once.