Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning — ThinkLLM