End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo|March 24, 2026arXiv

Key Takeaway

For a well-structured class of RL problems, you can now learn optimal policies efficiently using linear models without needing special oracles or being limited to tiny action spaces.

Summary

This paper solves a key challenge in reinforcement learning: how to efficiently learn good policies when using linear function approximation in a specific class of environments (linear Bellman complete MDPs). The researchers provide an algorithm that works with both small and large action spaces, achieving polynomial time and sample complexity—meaning it scales reasonably with problem size.

efficiency reasoning

Key Terms

linear-bellman-completeness linear-function-approximation markov-decision-process sample-complexity computational-complexity