Models Capabilities Use Cases Benchmarks Papers Glossary

Models Capabilities Use Cases Benchmarks Papers Glossary

About Privacy Terms RSS

ThinkLLM

Spot an error in our data? Let us know.

Glossary/FP8 Dynamic Quantization

FP8 Dynamic Quantization

deployment

A compression technique that reduces model size and speeds up inference by representing weights and activations using 8-bit floating-point numbers, with dynamic scaling adjusted per batch to maintain accuracy.

FP8 Dynamic Quantization — Glossary — ThinkLLM