A mid-sized mixture-of-experts model that activates only 3 billion of its 35 billion parameters per forward pass, keeping inference costs low while retaining a broad parameter pool for diverse tasks. It handles both text and images, switching between quick responses and extended reasoning chains depending on the task. The sparse activation design means it punches above its compute weight, though it may not match dense models of similar total parameter counts on the most demanding tasks.