Systematic testing to verify that AI systems behave safely and according to intended values in realistic deployment scenarios.