A compact, efficiency-focused model optimized for NVIDIA's NVFP4 numerical format, which reduces memory footprint and speeds up inference on compatible hardware. It trades some precision for significantly lower resource consumption, making it practical for deployment in constrained environments. Its behavior is shaped by quantization trade-offs, so outputs may differ subtly from full-precision counterparts.