The component that splits text into tokens (subwords or characters) that the model can process.
Quality of non-English language understanding and generation