Vision-Language Alignment

training

Training a model to understand the relationship between images and their text descriptions so it can match them together effectively.

Related Capabilities

Quality of vision, audio, and image understanding (distinct from modality support)