Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

Mingyu Yang, Keye Zheng, Congchao Cheng, Yujie Liu, Xingkang Lu et al.|June 18, 2026arXiv

Key Takeaway

MAA enables agents to learn which memory operations consistently help by accumulating cross-batch evidence, making agent self-improvement more efficient and reliable without requiring online training.

Summary

This paper addresses a problem in training AI agents: when the same memory operation gets conflicting feedback across different training batches, it's hard to know which operations actually work. MAA solves this by accumulating evidence for each operation across batches and filtering out unreliable ones, improving agent learning while using 75% fewer tokens during training.

training agents efficiency

Key Terms

trace-supervised-fine-tuning episodic-memory evidence-accumulation batch-distillation