Visual Preference Optimization with Rubric Rewards — ThinkLLM