Interleaved Inputs

architecture

The ability to mix images and text in any order within a single prompt, rather than requiring all images first or all text first.

Related Capabilities

Quality of vision, audio, and image understanding (distinct from modality support)