Measure of how much a model's output quality changes in response to different politeness levels in input prompts.