A compact, quantized text model that trades some precision for significantly reduced memory footprint through W4A16 quantization — weights stored in 4-bit, activations computed in 16-bit. This makes it practical to run on hardware that couldn't otherwise host a 12B parameter model. The channel-wise tensor quantization approach helps preserve output quality relative to more aggressive compression schemes.