Processing and understanding multiple types of information (video, audio, text) simultaneously to extract meaning and structure.