Breaking uncertainty into interpretable components—what's ambiguous about the task versus how many right answers exist—lets you estimate confidence efficiently in multimodal models without expensive sampling.
CoMet decomposes uncertainty in multimodal AI models into two components: context-specific ambiguity (from the task or prompt) and multiplicity (how many valid answers exist). A lightweight module estimates these without generating multiple answers, enabling efficient uncertainty quantification for open-ended tasks like visual question answering.