An optimization technique that uses gradient information and randomness to explore a reward landscape.