Sessa's hybrid architecture enables power-law decay of information loss over distance (O(ℓ^-β)) instead of exponential or linear decay, making it more effective for long-context language modeling while staying competitive on standard benchmarks.
Sessa combines attention mechanisms with state-space model feedback paths to improve how models retrieve information from long contexts.