A representation of data (like molecules or text) as a list of numbers that captures its essential features in a form that machine learning models can work with.