TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu, Huaxiu Yao et al.|June 4, 2026arXiv

Key Takeaway

Robot policies can control execution speed by scaling action magnitudes, enabling a single model to adapt between fast and slow motions without retraining—useful for tasks requiring both speed and precision.

Summary

TempoVLA enables robots to execute manipulation tasks at variable speeds by conditioning a Vision-Language-Action model on a speed parameter. The approach uses trajectory augmentation to create training data at different speeds and adds a conditioning mechanism to the policy, allowing a single model to handle both fast transit phases and slow, precise contact phases.

agents multimodal training

Key Terms

vision-language-action-model trajectory-augmentation action-magnitude conditioning-mechanism