The ability to identify and locate specific elements (like buttons or text fields) within a graphical user interface based on natural language descriptions.
Function calling, structured output, agent-style tool orchestration