An optimization technique that alternates between evaluating a policy and improving it based on that evaluation.