Language instructions can guide autonomous driving decisions in real-time, enabling personalized driving behaviors beyond fixed rules—this opens the door to more flexible, user-responsive autonomous systems.
Vega is a vision-language-action model that learns to drive by following natural language instructions. The system combines visual perception, language understanding, and world modeling to generate safe driving trajectories. Researchers created a 100,000-scene dataset with diverse driving instructions and trajectories to train the model.