If you're training neural networks on tabular data, consider using the Muon optimizer instead of AdamW—it gets better results, but costs more to compute.
This paper compares different optimizers (like AdamW and Muon) for training neural networks on tabular data. The researchers tested 13 optimizers across 40 datasets and found that Muon consistently outperforms the standard AdamW optimizer, though it requires more computational resources. They also show that averaging model weights over training helps AdamW perform better.