Subsets of evaluation prompts grouped by category or topic to analyze model behavior across specific types of inputs.