A compression technique that reduces model size and speeds up inference by representing weights and activations using 8-bit floating-point numbers, with dynamic scaling adjusted per batch to maintain accuracy.