Distributing training data across multiple GPUs that compute gradients independently then synchronize.