A training method that stabilizes reinforcement learning by anchoring functional tokens with a weighted auxiliary objective for stronger gradient updates.