A decision problem where an agent repeatedly chooses between options to maximize rewards while learning which is best.