Qwen3.5 397B A17B NVFP4

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A massive mixture-of-experts model that activates only 17 billion of its 397 billion parameters per forward pass, giving it the computational footprint of a mid-sized model while drawing on a vast pool of specialized knowledge. It handles complex reasoning, coding, and multilingual tasks with notable depth, and the NVFP4 quantization means it runs efficiently on NVIDIA hardware without dramatic quality loss.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Coding

Exceptional

Reasoning & Logic

Qwen3.5 397B A17B NVFP4

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A massive mixture-of-experts model that activates only 17 billion of its 397 billion parameters per forward pass, giving it the computational footprint of a mid-sized model while drawing on a vast pool of specialized knowledge. It handles complex reasoning, coding, and multilingual tasks with notable depth, and the NVFP4 quantization means it runs efficiently on NVIDIA hardware without dramatic quality loss.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Coding

Exceptional

Reasoning & Logic

Glossary

Complex ReasoningThe ability to work through multi-step problems, analyze nuanced information, and draw logical conclusions.Computational FootprintThe amount of memory, processing power, and time required to run a model; a smaller footprint means the model can run on less powerful hardware.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.MultilingualA model trained to understand and generate text in multiple languages, not just English.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.ReasoningThe model's ability to work through multi-step logical problems and provide justified answers rather than just pattern-matching.

Capabilities

Capabilities

Use Case Fit

Glossary