Techniques used to make models smaller and faster to run, allowing them to work on devices with limited memory or processing power.