Browser agents can scale more efficiently by learning from the implicit skills already present in human web interactions rather than from manually designed tasks, using skill distillation to convert trajectories into reusable, composable natural-language instructions.
This paper proposes a scalable approach for training browser agents by distilling human web browsing interactions into reusable natural-language skills. Rather than training agents from scratch on individual tasks, the method converts user interaction traces into compact skill descriptions that agents can retrieve and compose, organized in a skill graph to enable efficient learning and reuse.