A production test that alternates between two policies to measure their real-world performance differences.