You can add lifelong learning to frozen TTS models by storing pronunciation fixes in a memory network instead of updating weights—enabling fast adaptation to new proper nouns without retraining.
FlowEdit enables text-to-speech systems to learn and remember pronunciation corrections for proper nouns without retraining. It stores corrections as edits in a memory network, then retrieves and applies them at inference time, reducing pronunciation errors by 93% while keeping the original model frozen.