A mid-sized multimodal thinker that balances image and text understanding with efficient sparse activation — only 3B parameters fire per token despite a 35B total footprint. Quantized to NVFP4 format by AxionML, it trades some precision for GPU memory efficiency, making it practical for hardware-constrained deployments. Handles visual and language tasks in a single model without the bulk of dense alternatives.