For a well-structured class of RL problems, you can now learn optimal policies efficiently using linear models without needing special oracles or being limited to tiny action spaces.
This paper solves a key challenge in reinforcement learning: how to efficiently learn good policies when using linear function approximation in a specific class of environments (linear Bellman complete MDPs). The researchers provide an algorithm that works with both small and large action spaces, achieving polynomial time and sample complexity—meaning it scales reasonably with problem size.