The complete set of individual text units (tokens) that a model can recognize and process; a larger vocabulary allows the model to handle more diverse languages and specialized terms.
Quality of non-English language understanding and generation