A compact multimodal model that handles both text and image inputs, producing text outputs. As an open-weight release published by AxionML under Apache 2.0, it's freely available for self-hosting and modification. The NVFP4 designation suggests quantization optimized for NVIDIA hardware, trading some precision for reduced memory footprint and faster inference.