Sessa: Selective State Space Attention

Liubomyr Horbatko|April 20, 2026arXiv

Key Takeaway

Sessa's hybrid architecture enables power-law decay of information loss over distance (O(ℓ^-β)) instead of exponential or linear decay, making it more effective for long-context language modeling while staying competitive on standard benchmarks.

Summary

Sessa combines attention mechanisms with state-space model feedback paths to improve how models retrieve information from long contexts.

architecture efficiency reasoning

Key Terms

state-space-models selective-state-spaces long-context-handling attention-mechanism