When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels — ThinkLLM