Identifying which tokens in a model's output are critical for safety decisions to focus training effort.