PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao et al.|June 4, 2026arXiv

Key Takeaway

You can improve LLM training stability and speed by controlling weight matrix conditioning during training, then discard the mechanism at inference—no performance trade-off.

Summary

This paper introduces a PC layer that reshapes weight matrices during training using polynomial preconditioning to keep them well-conditioned, then removes it after training with no inference cost. Testing on Llama-1B shows faster convergence with both AdamW and Muon optimizers, with theory proving this approach ensures stable gradient descent in deep networks.

training efficiency

Key Terms

weight-conditioning singular-value-spectrum polynomial-preconditioner weight-parameterization