You can now generate large, reproducible test collections with known relevance answers to diagnose IR system failures at scale—useful for stress-testing before building expensive human-judged benchmarks.
SPECTRA is a framework for generating synthetic text collections and test datasets for information retrieval systems. Instead of relying on expensive human-labeled data, it creates controllable document corpora with built-in relevance labels, letting researchers test how search systems handle scale, latency, and ranking challenges before investing in real human evaluation.