A multimodal open-weight model that handles both text and image inputs, quantized to 4-bit weights for efficient local deployment. The W4A16 quantization scheme keeps activations at 16-bit precision while compressing weights, striking a balance between memory savings and output quality. It runs on consumer hardware that full-precision versions would struggle with, though quantization introduces some fidelity trade-offs compared to the original.