A process that converts text into numerical vectors (embeddings) that capture semantic meaning in a format models can work with.