Gemma 4 31B carries Google's multimodal DNA in an open-weight package, handling both text and images with the kind of quiet competence you'd expect from a mid-sized model punching thoughtfully above its weight. It reads visual inputs alongside text without requiring separate pipelines, making it practical for mixed-media tasks. As an Apache-licensed release, it runs on your own hardware with no usage restrictions, though at 31B parameters it demands meaningful GPU resources.