The shift from a model reproducing training data to creating novel outputs, triggered by increasing dataset size.