A metric measuring how well a model predicts the next token — lower perplexity means better language modeling.