Qwen3.5 397B A17B MLX 8bit

Qwen3.5

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates roughly 17 billion parameters per token despite housing nearly 400 billion total, keeping inference costs manageable while drawing on a vast pool of specialized knowledge. It handles both text and images, switching between modalities without much friction. The sheer parameter count gives it broad coverage across technical and general topics, though the 8-bit quantization means some precision is traded for practicality on local hardware.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Exceptional

Coding

Qwen3.5 397B A17B MLX 8bit

Qwen3.5

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates roughly 17 billion parameters per token despite housing nearly 400 billion total, keeping inference costs manageable while drawing on a vast pool of specialized knowledge. It handles both text and images, switching between modalities without much friction. The sheer parameter count gives it broad coverage across technical and general topics, though the 8-bit quantization means some precision is traded for practicality on local hardware.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Exceptional

Coding

Glossary

8-bit QuantizationA specific quantization method that represents model weights using 8 bits instead of the standard 32 bits, significantly reducing memory requirements.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.Parameter CountThe total number of adjustable weights in a model; more parameters generally mean more capacity to learn, but also require more computing power.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.PrecisionThe level of numerical detail a model uses to represent its internal values; higher precision means more accurate calculations but requires more memory.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.TokenA small unit of text (a word, subword, or punctuation mark) that a language model breaks input into for processing.

Capabilities

Capabilities

Use Case Fit

Glossary