You can use smaller, cheaper VLMs to automatically design reward shaping functions that guide RL agents without the risk of reward hacking, eliminating the need for manual reward engineering.
This paper automates reward shaping for reinforcement learning by using vision language models to learn a potential function that guides exploration without causing reward hacking. The method queries lightweight VLMs to compare image pairs, trains a model of the potential function from these preferences, and preserves optimal policies while improving sample efficiency in robotic tasks.