WPG is theoretically sound for continuous control: the Bellman recursion in RL creates favorable convergence properties similar to convex optimization, even though the problem isn't convex.
This paper proves that Wasserstein Policy Gradient (WPG), an algorithm for reinforcement learning that moves policies using optimal transport geometry, converges globally to optimal solutions. The key insight is that even though RL objectives aren't convex in the traditional sense, the Bellman equation creates a special geometric structure that guarantees convergence.