Simple prompting baselines outperform recent dense supervision methods, and you can now evaluate supervision signal quality before training by checking if scores align with reference Q-values—saving significant compute.
QVal is a training-free evaluation framework for comparing dense supervision signals used in long-horizon LLM agents.