Ensuring semantic consistency between different modalities (e.g., text and images) in generated content.