Amortizing gradient computation across multiple training steps by reusing cached gradients for repeated examples.