A compact multimodal model that combines visual understanding with extended reasoning capabilities. It processes both text and images while applying deliberate, step-by-step thinking before producing answers — trading raw speed for more considered outputs. As an open-weight model from the MiniCPM family, it prioritizes accessibility and efficiency at smaller parameter scales.