There's a fundamental gap between what's theoretically possible and what's computationally efficient for learning drifting concepts with noise—polynomial-time algorithms need Δ^{1/3} error scaling while information theory only requires Δ^{1/2}.
This paper studies how to efficiently learn linear classifiers (halfspaces) when the target concept drifts over time and labels are corrupted by noise. The authors develop an algorithm achieving error rate η + Õ(Δ^{1/3}/γ) and prove this is essentially optimal for polynomial-time algorithms, even though information theory allows Δ^{1/2} scaling.