A neural network component that converts images into numerical representations that capture visual features and patterns.
Quality of vision, audio, and image understanding (distinct from modality support)