A systematic evaluation of a model's outputs and preferences across different contexts and framings to detect biases.