Qwen3.5 122B A10B FP8

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words122B params

A large mixture-of-experts model that activates only 10 billion parameters per forward pass despite its 122B total scale, making it surprisingly economical to run. It handles both text and images, switching between a deliberate thinking mode and a faster direct-response mode depending on the task. The FP8 quantization keeps memory footprint manageable without dramatically sacrificing capability.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Reasoning & Logic

Exceptional

Creative Writing

Qwen3.5 122B A10B FP8

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released February 2026262K context≈ 196,608 words122B params

A large mixture-of-experts model that activates only 10 billion parameters per forward pass despite its 122B total scale, making it surprisingly economical to run. It handles both text and images, switching between a deliberate thinking mode and a faster direct-response mode depending on the task. The FP8 quantization keeps memory footprint manageable without dramatically sacrificing capability.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Reasoning & Logic

Exceptional

Creative Writing

Glossary

FP8 QuantizationA compression technique that reduces model size by representing weights using 8-bit floating-point numbers instead of higher precision, making it faster and more memory-efficient.Forward PassA single computation cycle where input data flows through the model's layers to produce an output prediction.Memory FootprintThe amount of RAM or storage space a model requires to run, which is critical for deployment on resource-constrained devices.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.QuantizationReducing a model's numerical precision (e.g., from 16-bit to 4-bit) to shrink memory usage and speed up inference.Thinking ModeA model operating mode where it explicitly works through problems step-by-step before generating a final answer, improving accuracy on complex tasks.

Capabilities

Capabilities

Use Case Fit

Glossary