Attention mechanism where each token only attends to a bounded window of preceding tokens instead of all previous tokens.