Automating Potential-based Reward Shaping with Vision Language Model Guidance

Henrik Müller, Daniel Kudenko|June 25, 2026arXiv

Key Takeaway

You can use smaller, cheaper VLMs to automatically design reward shaping functions that guide RL agents without the risk of reward hacking, eliminating the need for manual reward engineering.

Summary

This paper automates reward shaping for reinforcement learning by using vision language models to learn a potential function that guides exploration without causing reward hacking. The method queries lightweight VLMs to compare image pairs, trains a model of the potential function from these preferences, and preserves optimal policies while improving sample efficiency in robotic tasks.

multimodal

Key Terms

potential-based-reward-shaping reward-hacking vision-language-model preference-based-learning sparse-rewards