FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Harshit Singh, Ayush Pratap Singh, Nityanand Mathur|June 18, 2026arXiv

Key Takeaway

You can add lifelong learning to frozen TTS models by storing pronunciation fixes in a memory network instead of updating weights—enabling fast adaptation to new proper nouns without retraining.

Summary

FlowEdit enables text-to-speech systems to learn and remember pronunciation corrections for proper nouns without retraining. It stores corrections as edits in a memory network, then retrieves and applies them at inference time, reducing pronunciation errors by 93% while keeping the original model frozen.

training efficiency applications

Key Terms

flow-matching episodic-memory content-addressable-memory zero-shot-learning