Efficiently Learning Drifting Halfspaces with Massart Noise

Mingchen Ma, Guyang Cao, Jelena Diakonikolas, Ilias Diakonikolas|June 9, 2026arXiv

Key Takeaway

There's a fundamental gap between what's theoretically possible and what's computationally efficient for learning drifting concepts with noise—polynomial-time algorithms need Δ^{1/3} error scaling while information theory only requires Δ^{1/2}.

Summary

This paper studies how to efficiently learn linear classifiers (halfspaces) when the target concept drifts over time and labels are corrupted by noise. The authors develop an algorithm achieving error rate η + Õ(Δ^{1/3}/γ) and prove this is essentially optimal for polynomial-time algorithms, even though information theory allows Δ^{1/2} scaling.

training evaluation

Key Terms

massart-noise concept-drift margin-separable information-computation-tradeoff low-degree-polynomial-tests