StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr et al.|May 7, 2026arXiv

Key Takeaway

Adding explicit strategy planning at the start of a task—rather than pure reactive decision-making—dramatically improves both learning efficiency and success rates for LLM agents on long-horizon tasks.

Summary

StraTA improves how language models learn to make decisions over many steps by having them first plan a high-level strategy before acting. Instead of reacting moment-by-moment, the model samples a strategy from the initial state, follows it through actions, and learns both strategy planning and action execution together.

agents reasoning

Key Terms

agentic-reinforcement-learning trajectory-abstraction hierarchical-planning grpo credit-assignment