Training data that has been processed to remove or correct unsafe actions, ensuring the learned controller avoids harmful behaviors.