A training penalty that discourages a policy from relying on safety corrections, forcing it to learn safer behavior directly.