An architecture where only a subset of the model's specialized sub-networks (experts) activate for each input, reducing computation while maintaining capability.