Qwen3.5 397B A17B

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates only 17 billion of its 397 billion parameters per forward pass, keeping inference costs manageable while drawing on a vast pool of specialized knowledge. It handles both text and images, reasoning across visual and written content with reasonable coherence. The sparse activation pattern means it can punch above its weight computationally, though behavior can occasionally feel uneven across domains depending on which experts engage.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Strong

Tool Use

Qwen3.5 397B A17B

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words397B params

A large mixture-of-experts model that activates only 17 billion of its 397 billion parameters per forward pass, keeping inference costs manageable while drawing on a vast pool of specialized knowledge. It handles both text and images, reasoning across visual and written content with reasonable coherence. The sparse activation pattern means it can punch above its weight computationally, though behavior can occasionally feel uneven across domains depending on which experts engage.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Instruction Following

Strong

Tool Use

Glossary

Activation PatternThe specific configuration of which neurons are active across a network when processing a particular input or task.CoherenceThe quality of maintaining consistent meaning and logical flow across multiple sentences or exchanges in a conversation.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.ReasoningThe model's ability to work through multi-step logical problems and provide justified answers rather than just pattern-matching.Sparse ActivationA technique where only a subset of a model's parameters are used for each input, reducing computational cost while maintaining performance.

Capabilities

Capabilities

Use Case Fit

Glossary