The ability to understand and execute physical tasks involving grasping, moving, and interacting with objects in the real world.
Function calling, structured output, agent-style tool orchestration