Model-generated skills can improve agent performance, but their effectiveness depends on how they're extracted and which agent uses them—not on model size or baseline strength.
This paper studies how AI agents can reuse skills—structured procedures extracted from past experience—to improve performance. The researchers built a comprehensive evaluation framework testing skill extraction and reuse across five different task domains, finding that while model-generated skills help on average, they sometimes hurt performance.