A mid-sized multimodal reasoner that accepts both text and images as input, quantized to 8-bit for more efficient memory usage. The A3B designation suggests a mixture-of-experts architecture where only a subset of parameters activate per inference, balancing capability with compute cost. Being an mlx-community release, it's optimized for Apple Silicon environments.