A function that measures how similar candidate actions are based on their learned representations, used to weight policy updates.