Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations — ThinkLLM