Adversarial training can make speaker embeddings invariant to language/script while preserving speaker identity—critical for multilingual voice cloning systems that need to recognize the same speaker across different languages.
Speaker encoders for voice cloning often fail when audio switches between languages or scripts—a problem especially acute for Indic languages. This paper introduces LASE, a small neural layer that makes speaker embeddings language-agnostic by combining speaker identity learning with adversarial training against language classification.