Qwopus3.6 35B A3B v1 PrismaSCOUT Blackwell NVFP4 BF16 vllm 4.75bits

Qwopus3.6

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released May 2026context N/A35B params

A quantized multimodal model that accepts both text and image inputs, running at 4.75-bit mixed precision (NVFP4/BF16) optimized for vllm inference on Blackwell hardware. At 35B parameters with an A3B architecture, it balances memory efficiency with scale. Details about its specific reasoning or task strengths are limited beyond its technical configuration.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Long Context

Moderate

Reasoning & Logic

Use Case Fit

Fit scores are AI-generated based on model capabilities, intended use, and technical specifications. Learn more

Qwopus3.6 35B A3B v1 PrismaSCOUT Blackwell NVFP4 BF16 vllm 4.75bits

Qwopus3.6

Open WeightModel weights are publicly available — can be downloaded and self-hosted

Released May 2026context N/A35B params

A quantized multimodal model that accepts both text and image inputs, running at 4.75-bit mixed precision (NVFP4/BF16) optimized for vllm inference on Blackwell hardware. At 35B parameters with an A3B architecture, it balances memory efficiency with scale. Details about its specific reasoning or task strengths are limited beyond its technical configuration.

Capabilities

Capability scores are AI-generated based on model documentation, benchmarks, and technical specifications. Learn more

Long Context

Moderate

Reasoning & Logic

Use Case Fit

Fit scores are AI-generated based on model capabilities, intended use, and technical specifications. Learn more

Glossary

ArchitectureThe underlying structural design of a neural network that defines how data flows through layers and components.BF16A 16-bit floating-point format that balances precision and memory efficiency, commonly used for training and deploying large language models.InferenceThe process of running a trained model to generate predictions or outputs from new inputs.Memory EfficiencyHow well a model uses available RAM or GPU memory, allowing it to run on smaller or less expensive hardware.Mixed PrecisionUsing different numerical precisions for different parts of computation.MultimodalA model that can process and understand multiple types of input, such as both text and images.Multimodal ModelAn AI model that can process and understand multiple types of input data, such as video, images, and text together.ParametersThe learned numerical values in a model — more parameters generally means more capacity but higher compute cost.PrecisionThe level of numerical detail a model uses to represent its internal values; higher precision means more accurate calculations but requires more memory.QuantizedA technique that reduces a model's size and memory usage by storing weights with lower precision (fewer bits), trading some accuracy for efficiency.ReasoningThe model's ability to work through multi-step logical problems and provide justified answers rather than just pattern-matching.vLLMAn inference engine optimized for running large language models efficiently by batching requests and managing memory intelligently.

Capabilities

Use Case Fit

Capabilities

Use Case Fit

Similar Models

Glossary