A compression technique that represents model weights using only 4-bit floating-point numbers instead of larger formats, reducing memory usage and speeding up inference.