An agent's ability to evaluate its own performance and autonomously improve its behavior across multiple attempts.