The ability for an AI model to interact with computer interfaces, navigate software applications, and execute actions on a user's behalf by understanding and responding to visual or textual representations of screens.
Function calling, structured output, agent-style tool orchestration