Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva et al.|June 5, 2026arXiv

Key Takeaway

Use Bradley-Terry statistical modeling instead of naive metric averaging to rank recommendation systems fairly—it accounts for dataset differences and can even predict algorithm performance on unseen datasets.

Summary

This paper solves a real problem in AI: how to fairly rank recommendation algorithms when they perform differently on different datasets. Instead of just averaging scores across benchmarks (which can be misleading), the authors use a statistical model called Bradley-Terry to create more reliable rankings that account for dataset characteristics like sparsity and size.

evaluation

Key Terms

bradley-terry-model ranking-consistency dataset-taxonomy