Power-of-Two quantization with orthogonal residual projection lets you run large models on edge hardware with minimal accuracy loss and no multiplier circuits—calibration takes ~15 minutes instead of hours.
OrpQuant enables efficient deployment of large AI models on edge devices by using a novel quantization method that replaces expensive multiply operations with simple bit-shifts. The approach uses geometric projection to maintain accuracy even at ultra-low bit widths (3-4 bits), and can calibrate models 10x faster than existing methods.