MEME: Multi-entity & Evolving Memory Evaluation

Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh|May 12, 2026arXiv

Key Takeaway

LLM agents struggle with dependency reasoning in persistent memory—when facts relate to each other, systems collapse to near-random performance, and fixing this requires impractically expensive configurations.

Summary

This paper introduces MEME, a benchmark for evaluating how well AI agents manage information across multiple sessions. It tests six memory tasks including complex scenarios like tracking dependencies between facts and handling deletions.

evaluation agents reasoning

Key Terms

episodic-memory dependency-reasoning retrieval-pipeline persistent-environments