You can use your existing multi-GPU setup to automatically find better learning rates during training by having each GPU try slightly different rates and averaging them periodically—no extra compute needed.
This paper proposes HDET, a method that uses multiple GPU replicas to explore different learning rates during training instead of computing identical updates. Replicas train independently with different learning rates, then synchronize periodically.