You can generate better-coverage samples in parallel by using quasi-Monte Carlo instead of random sampling—achieving the same performance with significantly fewer inference calls, making scaling compute more efficient.
QuasiMoTTo improves inference efficiency by generating correlated rather than independent samples during test-time scaling. Instead of wasting compute on redundant solutions, it uses quasi-Monte Carlo sampling to spread samples across the output space more evenly, achieving the same accuracy with 25-47% fewer samples while maintaining correct marginal distributions for training.