Unequal influence or representation of different data types (like images vs. text) in a multimodal model.