Improving agent behavior by analyzing and learning from complete sequences of actions and states across multiple episodes.