When continually training multimodal models on new tasks, routing decisions based only on semantic similarity fail—you also need to account for output format differences to prevent gradient interference and task confusion.
ProtoAda solves a key problem in continual learning for multimodal AI: when models learn new vision-language tasks sequentially, they often forget old ones or mix up tasks with different output formats (like coordinate prediction vs. text answers).