Treating the replay buffer as a primary algorithmic lever—not just a storage mechanism—can dramatically improve quantum circuit optimization by adapting how past experiences are sampled and transferred across different noise conditions.
This paper improves deep reinforcement learning for quantum circuit optimization by redesigning how the algorithm stores and reuses past experiences.