Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI

David Joohun Kim, Daniyal Anjum, Bonny Banerjee, Omar Abbasi|April 9, 2026arXiv

Key Takeaway

Device-addressed speech detection works much better when you consider the conversation context and history rather than analyzing each utterance in isolation—and this sequential approach can run efficiently on edge devices.

Summary

This paper tackles the problem of detecting whether spoken audio is addressed to a device (like a smart speaker) before sending it for transcription. Rather than treating each utterance independently, the authors model it as a sequential decision problem that considers conversation history.

agents efficiency multimodal

Key Terms

on-device-inference sequential-routing multimodal-fusion interaction-history