A compact vision-language model fine-tuned by ADSKAILab on Qwen3's 2B architecture, though its input modalities are listed as text-only despite the VL (vision-language) designation — worth noting as a potential discrepancy. At 2B parameters it sits on the smaller end, trading raw capacity for efficiency and deployability.