A mid-sized mixture-of-experts model that activates only 3B parameters per token despite its 35B total, making it surprisingly efficient at inference. It handles both text and images, reasoning through visual content with reasonable competence. The sparse activation means it punches above its weight on compute, though it may occasionally show gaps compared to dense models of similar active parameter counts.