Reasoning in embeddings works best when applied selectively—use counterfactual analysis to identify which query-target pairs benefit from reasoning, then apply reinforcement learning to invoke it only when necessary.
This paper improves multimodal embedding models by selectively using reasoning only when needed. Instead of forcing all inputs through a reasoning process, the model learns when reasoning helps align queries with targets, reducing computation while achieving better performance on benchmark tasks.