GLM 5.1 NVFP4 is a quantized text-in, text-out model optimized by NVIDIA using FP4 precision, which reduces memory footprint and can improve inference speed on compatible hardware. The trade-off is that FP4 quantization may introduce minor accuracy degradation compared to full-precision counterparts. It carries a large context window of roughly 200K tokens, making it capable of handling long documents in a single pass.