Gaussian Approximation for Asynchronous Q-learning

Artemy Rubtsov, Sergey Samsonov, Vladimir Ulyanov, Alexey Naumov|April 8, 2026arXiv

Key Takeaway

Asynchronous Q-learning with polynomial step sizes converges to a normal distribution with quantifiable rates, providing theoretical guarantees about the algorithm's statistical behavior in high-dimensional settings.

Summary

This paper analyzes how asynchronous Q-learning behaves in high dimensions by proving it converges to a normal distribution at a specific rate. The authors show that with the right step size, the algorithm's averaged iterates follow predictable statistical patterns, which helps understand when and why Q-learning works reliably in complex problems.

reasoning

Key Terms

q-learning polyak-ruppert-averaging central-limit-theorem markov-chain convergence-rate