OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

Shang Zhou, Wenhao Chai, Kaiyuan Liu, Huanzhi Mao, Qiuyang Mang et al.|May 14, 2026arXiv

Key Takeaway

Instead of judging multiple reasoning attempts individually (which is noisy), compare them pairwise and aggregate votes to find the best solution—this scales test-time compute breadth more reliably than single-trace depth scaling.

Summary

OpenDeepThink improves LLM reasoning by running multiple solution attempts in parallel and selecting the best one using pairwise comparisons between candidates, rather than trying to judge each solution independently. The method uses Bradley-Terry aggregation to rank candidates based on LLM pairwise judgments, then evolves the top solutions using critiques from comparisons.

reasoning evaluation

Key Terms

bradley-terry-model pairwise-comparison test-time-compute population-based-search