A training technique that prioritizes data from under-explored regions of the state-action space to improve model robustness.