You can leverage existing pretrained models for causal reasoning tasks by building a modular pipeline that extracts concepts, manipulates them causally, and generates counterfactuals—no need to retrain from scratch.
This paper presents FM-CGM, a framework that combines pretrained foundation models (reasoning models and diffusion models) to perform causal reasoning on images. It enables zero-shot discovery of causal relationships, intervention on concepts, and generation of counterfactual images—all without retraining the models.