Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao, Zhongjin Luo et al.|June 15, 2026arXiv

Key Takeaway

Splitting sparse episode outcomes into separate success and efficiency signals with state-adaptive weighting, plus intervention-aware credit assignment, enables effective online RL fine-tuning of robot policies from minimal supervision.

Summary

This paper solves a key problem in robot learning: when fine-tuning pretrained vision-language-action models through trial-and-error, each episode only gives a binary success/failure signal, but the model needs per-step feedback.

agents training

Key Terms

credit-assignment behavior-cloning vision-language-action-model advantage-weighting sparse-reward