Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin, Yunrui Lian et al.|June 2, 2026arXiv

Key Takeaway

Scaling motion data and model capacity enables a single generative model to handle complex, dynamic robot control without task-specific training, opening new possibilities for general-purpose robot learning.

Summary

Humanoid-GPT is a large Transformer model trained on 2 billion frames of motion capture data to control humanoid robots. Unlike previous shallow models that struggled with dynamic movements, this approach scales both data and model size to achieve zero-shot generalization—meaning it can control motions and tasks it never saw during training.

scaling agents

Key Terms

zero-shot-generalization motion-capture causal-language-model whole-body-controller retargeting