APPO: Agentic Procedural Policy Optimization

Xucong Wang, Ziyu Ma, Yong Wang, Yuxiang Ji, Shidong Yang et al.|June 10, 2026arXiv

Key Takeaway

Fine-grained credit assignment at individual decision points in agent sequences, rather than at coarse tool-call boundaries, significantly improves learning efficiency and tool-use performance in agentic RL systems.

Summary

APPO improves how AI agents learn to use tools by identifying the most important decision points in their reasoning sequences and assigning credit more precisely.

agents reasoning

Key Terms

agentic-reinforcement-learning credit-assignment branching-score token-entropy procedure-level-advantage-scaling