Asynchronous Q-learning with polynomial step sizes converges to a normal distribution with quantifiable rates, providing theoretical guarantees about the algorithm's statistical behavior in high-dimensional settings.
This paper analyzes how asynchronous Q-learning behaves in high dimensions by proving it converges to a normal distribution at a specific rate. The authors show that with the right step size, the algorithm's averaged iterates follow predictable statistical patterns, which helps understand when and why Q-learning works reliably in complex problems.