Techniques and design choices that make a model faster and more efficient to run on hardware, prioritizing speed and resource usage over training flexibility.