AlignAtt can be applied to decoder-only LLMs by using prompt engineering, selective attention head selection, and runtime attention capture—enabling fast simultaneous translation without the encoder-decoder architecture that earlier methods relied on.
This paper adapts AlignAtt, a technique for controlling when a machine translation model reads source text, to work with decoder-only language models like Gemma-4. The system performs simultaneous speech translation by incrementally updating transcripts and deciding when to translate, achieving low latency while maintaining translation quality for English-to-German, Italian, and Chinese.