A quantization method that represents model weights using only 4-bit integers instead of full-precision floating-point numbers, dramatically shrinking the model's memory footprint.