StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Shaghayegh Kolli, Timo Cavelius, Nafiseh Nikeghbal, Samantha Dalal, Jana Diesner|June 18, 2026arXiv

Key Takeaway

Social biases in vision-language models aren't random—they're driven by a small set of visual cues like age, body type, and clothing style. Understanding which specific visual features trigger biased judgments is crucial for building fairer AI systems.

Summary

This paper creates a controlled benchmark with 25K photorealistic images to measure how specific visual attributes (like age, body type, and fashion style) cause social biases in multimodal AI models. By keeping identity fixed and changing one visual cue at a time, researchers show that just 15 visual attributes account for 80% of bias variation across six major MLLMs.

evaluation safety multimodal

Key Terms

multimodal-large-language-model social-bias attribute-level-bias controlled-benchmark