A model trained to convert raw input (like music or text) into meaningful numerical patterns that capture important features, rather than generating direct outputs like text or classifications.