SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong|May 4, 2026arXiv

Key Takeaway

Process reward models need to account for the full context of reasoning paths and penalize risky intermediate steps, not just reward final correctness—this matters most in domains where wrong reasoning paths are costly.

Summary

This paper addresses a key problem in evaluating AI reasoning: process reward models often give high scores to flawed reasoning paths because later correct steps mask earlier mistakes. The authors propose SCPRM, which evaluates reasoning steps by looking at what came before and measuring distance to the target, then use it with tree search to answer questions about knowledge graphs.

reasoning evaluation agents

Key Terms

process-reward-model knowledge-graph monte-carlo-tree-search schema-context multi-hop-reasoning