A penalty that prevents a model from changing its behavior too drastically by measuring the statistical distance between old and new policies.