Evaluates instruction-following ability using diverse, complex instructions that test a model's ability to precisely adhere to specified constraints
Tests models on following complex, multi-constraint instructions across diverse task types. Uses automatic evaluation with programmatic and LLM-based verification. More challenging than IFEval due to more complex and varied constraints.
No model scores recorded yet
Scores will appear here as the pipeline processes model data