Social biases in vision-language models aren't random—they're driven by a small set of visual cues like age, body type, and clothing style. Understanding which specific visual features trigger biased judgments is crucial for building fairer AI systems.
This paper creates a controlled benchmark with 25K photorealistic images to measure how specific visual attributes (like age, body type, and fashion style) cause social biases in multimodal AI models. By keeping identity fixed and changing one visual cue at a time, researchers show that just 15 visual attributes account for 80% of bias variation across six major MLLMs.