The ability of a model to achieve strong performance while using fewer total parameters or activating fewer parameters during inference, reducing memory and computational requirements.