A multimodal open-weight model from InternLM that handles both text and image inputs. It runs in FP8 precision, which reduces memory footprint compared to full-precision variants — a practical trade-off that makes it more accessible on consumer or mid-range hardware. As a preview release, it may carry rough edges or evolving behavior compared to a finalized version.