Valid Inference with Synthetic Data via Task Exchangeability

Lezhi Tan, Tijana Zrnic|June 11, 2026arXiv

Key Takeaway

You can use synthetic data for research if you can prove your task is exchangeable with historical tasks where real data is available; this framework provides statistical guarantees that your conclusions remain valid.

Summary

This paper provides statistical methods for safely using synthetic data in research by introducing 'task exchangeability'—a condition ensuring your current research question is mathematically similar to past tasks where real data exists. The authors develop inference techniques with validity guarantees and test them on LLM-generated survey responses and AI evaluation tasks.

evaluation data safety

Key Terms

synthetic-data task-exchangeability validity-guarantees