OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

Maoyang Xiang, Bo Wang, Tao Luo|May 25, 2026arXiv

Key Takeaway

Power-of-Two quantization with orthogonal residual projection lets you run large models on edge hardware with minimal accuracy loss and no multiplier circuits—calibration takes ~15 minutes instead of hours.

Summary

OrpQuant enables efficient deployment of large AI models on edge devices by using a novel quantization method that replaces expensive multiply operations with simple bit-shifts. The approach uses geometric projection to maintain accuracy even at ultra-low bit widths (3-4 bits), and can calibrate models 10x faster than existing methods.

efficiency architecture

Key Terms

power-of-two-quantization orthogonal-residual-projection multiply-accumulate angular-resolution calibration-time