Initial training of a model on large unlabeled datasets to learn general language patterns before task-specific adaptation.