You can train a transformer to act as a fast Bayesian predictor by treating prior information as part of the input context, achieving oracle-level accuracy orders of magnitude faster than traditional Bayesian methods.
This paper presents a method for training transformers to perform Bayesian inference quickly by learning from examples of prior distributions and target datasets. Instead of computing exact Bayesian predictions (which is slow), the model learns to map sequences of prior information and data directly to predictions, enabling fast uncertainty-aware inference that adapts to new priors at test time.