A step that transforms raw input data into a cleaner, more useful format before feeding it to another model or system.