A protocol that measures how much of a policy's safety comes from the learned policy versus protective layers like filters.