Systematic evaluation of an AI system to detect and measure bias across demographic groups or decision scenarios.