Screening Is Enough

Ken M. Nakanishi|April 1, 2026arXiv

Key Takeaway

Screening attention removes the need for global competition among keys by using absolute relevance thresholds, achieving 40% parameter reduction and 3.2× faster inference compared to Transformers.

Summary

This paper introduces Multiscreen, a language model architecture that replaces standard softmax attention with a 'screening' mechanism. Instead of distributing attention weights across all keys, screening evaluates each key against a threshold to decide which ones are relevant, eliminating the need for keys to compete with each other.

architecture efficiency scaling

Key Terms

screening absolute-relevance softmax-attention