Current specification-guided RL methods generalize poorly to new environments and complex tasks—this benchmark helps identify where they fail and guides development of more robust approaches.
SpecRLBench is a benchmark for testing how well reinforcement learning agents can follow formal task specifications (written in linear temporal logic) across different, unseen environments and robot types. The benchmark reveals that current methods struggle as tasks and environments become more complex, providing a structured way to develop better specification-guided RL systems.