Combining actions from multiple policies (e.g., cloned and learned) based on their estimated quality or confidence.