Native Processing

architecture

When a model can directly understand different types of input (like images or audio) without needing to convert them to text first.

Related Capabilities

Quality of vision, audio, and image understanding (distinct from modality support)