GLM 5.2 mxfp4

GLM

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released June 20261049K context≈ 786,432 words

GLM 5.2 mxfp4 is a quantized text model running in the mxfp4 format, meaning it trades a small amount of numerical precision for significantly reduced memory footprint and faster inference on compatible hardware. It handles an unusually large context window of over one million tokens, which allows it to process very long documents or conversations in a single pass. The trade-off is that mxfp4 quantization can introduce subtle quality degradation compared to full-precision variants.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Long Context

Exceptional

Instruction Following

GLM 5.2 mxfp4

GLM

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released June 20261049K context≈ 786,432 words

GLM 5.2 mxfp4 is a quantized text model running in the mxfp4 format, meaning it trades a small amount of numerical precision for significantly reduced memory footprint and faster inference on compatible hardware. It handles an unusually large context window of over one million tokens, which allows it to process very long documents or conversations in a single pass. The trade-off is that mxfp4 quantization can introduce subtle quality degradation compared to full-precision variants.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Long Context

Exceptional

Instruction Following

Glossary

Context WindowThe maximum number of tokens a model can process in a single conversation or prompt.Full-PrecisionA model using standard 32-bit floating-point numbers to represent weights, providing maximum accuracy but requiring more memory.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.MXFP4A low-precision floating-point format (4-bit) designed for efficient neural network computation while maintaining reasonable accuracy.Memory FootprintThe amount of RAM or storage space a model requires to run, which is critical for deployment on resource-constrained devices.PrecisionThe level of numerical detail a model uses to represent its internal values; higher precision means more accurate calculations but requires more memory.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.QuantizedA technique that reduces a model's size and memory usage by storing weights with lower precision (fewer bits), trading some accuracy for efficiency.Text ModelA language model that processes and generates only text, without support for images, audio, or other media types.TokensThe basic units of text that a language model processes, typically representing words or word fragments.

Capabilities

Capabilities

Use Case Fit

Glossary