Scaling motion data and model capacity enables a single generative model to handle complex, dynamic robot control without task-specific training, opening new possibilities for general-purpose robot learning.
Humanoid-GPT is a large Transformer model trained on 2 billion frames of motion capture data to control humanoid robots. Unlike previous shallow models that struggled with dynamic movements, this approach scales both data and model size to achieve zero-shot generalization—meaning it can control motions and tasks it never saw during training.