Generating detailed, comprehensive descriptions of images that capture rich visual information and relationships rather than brief summaries.
World knowledge accuracy, recall of facts and relationships