Benchmarking Optimizers for MLPs in Tabular Deep Learning

Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov, Artem Babenko|April 16, 2026arXiv

Key Takeaway

If you're training neural networks on tabular data, consider using the Muon optimizer instead of AdamW—it gets better results, but costs more to compute.

Summary

This paper compares different optimizers (like AdamW and Muon) for training neural networks on tabular data. The researchers tested 13 optimizers across 40 datasets and found that Muon consistently outperforms the standard AdamW optimizer, though it requires more computational resources. They also show that averaging model weights over training helps AdamW perform better.

training efficiency evaluation

Key Terms

adamw muon-optimizer exponential-moving-average tabular-data