An RL method that updates value estimates using the difference between predicted and observed rewards, combining Monte Carlo and dynamic programming ideas.