Distillation defenses must be evaluated against adaptive attackers who strategically choose which outputs to learn from—not just passive ones—and simple forward-pass defenses like PoE can match expensive defenses while preserving reasoning quality.
This paper studies how AI model providers face a trade-off: making models more useful (through better outputs) makes them easier to copy through distillation attacks. The authors develop a game-theoretic framework to understand this trade-off and propose Product-of-Experts (PoE), a lightweight defense that combines the teacher model with a proxy student during generation.