Qwen3.5 397B A17B FP8

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates only 17 billion parameters per forward pass despite its 397B total size, making it surprisingly efficient at inference time. It handles both text and images, switching between extended reasoning chains and direct responses depending on the task. The sheer parameter count gives it broad knowledge coverage, though deploying it still demands serious hardware.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Reasoning & Logic

Exceptional

Factual Knowledge

Qwen3.5 397B A17B FP8

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates only 17 billion parameters per forward pass despite its 397B total size, making it surprisingly efficient at inference time. It handles both text and images, switching between extended reasoning chains and direct responses depending on the task. The sheer parameter count gives it broad knowledge coverage, though deploying it still demands serious hardware.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Reasoning & Logic

Exceptional

Factual Knowledge

Glossary

Extended ReasoningA capability that allows a model to think through complex problems step-by-step internally before providing a final answer.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.Inference TimeThe amount of time it takes for a model to process input and generate output after it has been trained.Parameter CountThe total number of adjustable weights in a model; more parameters generally mean more capacity to learn, but also require more computing power.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.ReasoningThe model's ability to work through multi-step logical problems and provide justified answers rather than just pattern-matching.Reasoning ChainsA sequence of logical steps a model follows to work through a problem methodically rather than jumping directly to an answer.

Capabilities

Capabilities

Use Case Fit

Glossary