AI regulation now requires safety proof, but lacks a technical method to verify it. This framework provides that missing tool: a black-box statistical test that produces auditable, quantifiable evidence of system safety for regulatory compliance.
This paper proposes a statistical certification framework for AI systems in high-risk applications like lending and autonomous vehicles. It adapts aviation safety standards to create a two-stage process where regulators define acceptable failure rates, then developers use statistical tools (RoMA and gRoMA) to verify their systems meet those thresholds—without needing access to model internals.