Leveraging Foundation Models for Causal Generative Modeling

Aneesh Komanduri, Xintao Wu|May 22, 2026arXiv

Key Takeaway

You can leverage existing pretrained models for causal reasoning tasks by building a modular pipeline that extracts concepts, manipulates them causally, and generates counterfactuals—no need to retrain from scratch.

Summary

This paper presents FM-CGM, a framework that combines pretrained foundation models (reasoning models and diffusion models) to perform causal reasoning on images. It enables zero-shot discovery of causal relationships, intervention on concepts, and generation of counterfactual images—all without retraining the models.

reasoning multimodal applications

Key Terms

causal-inference counterfactual-generation cross-attention zero-shot-reasoning diffusion-model