A compact, efficient model from NVIDIA's Nemotron family, running in FP8 precision to keep memory footprint small without sacrificing too much capability. Its 262K token context window is notably generous for a 4B parameter model, making it capable of processing long documents despite its small size. The FP8 quantization means it trades some numerical precision for speed and resource efficiency.