Qwen3.5 35B A3B NVFP4

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released March 2026262K context≈ 196,608 words35B params

A mid-sized multimodal thinker that balances image and text understanding with efficient sparse activation — only 3B parameters fire per token despite a 35B total footprint. Quantized to NVFP4 format by AxionML, it trades some precision for GPU memory efficiency, making it practical for hardware-constrained deployments. Handles visual and language tasks in a single model without the bulk of dense alternatives.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Multilingual

Strong

Reasoning & Logic

Qwen3.5 35B A3B NVFP4

Qwen 3

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released March 2026262K context≈ 196,608 words35B params

A mid-sized multimodal thinker that balances image and text understanding with efficient sparse activation — only 3B parameters fire per token despite a 35B total footprint. Quantized to NVFP4 format by AxionML, it trades some precision for GPU memory efficiency, making it practical for hardware-constrained deployments. Handles visual and language tasks in a single model without the bulk of dense alternatives.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Multilingual

Strong

Reasoning & Logic

Glossary

GPU MemoryThe high-speed memory on a graphics processor used to store and process model weights and computations during inference.Memory EfficiencyHow well a model uses available RAM or GPU memory, allowing it to run on smaller or less expensive hardware.MultimodalA model that can process and understand multiple types of input, such as both text and images.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.PrecisionThe level of numerical detail a model uses to represent its internal values; higher precision means more accurate calculations but requires more memory.QuantizedA technique that reduces a model's size and memory usage by storing weights with lower precision (fewer bits), trading some accuracy for efficiency.Sparse ActivationA technique where only a subset of a model's parameters are used for each input, reducing computational cost while maintaining performance.TokenA small unit of text (a word, subword, or punctuation mark) that a language model breaks input into for processing.

Capabilities

Capabilities

Use Case Fit

Glossary