A safety evaluation method using predefined test scenarios and a rubric, judged by human or automated evaluators.