Gradient noise with extreme values that occur more frequently than in normal distributions, common in real LLM training.