A compression technique that reduces a model's size and memory usage by storing weights as 4-bit integers instead of higher-precision numbers, making it faster and cheaper to run with minimal accuracy loss.