Embedding norms in contrastive models aren't wasted information—they automatically capture semantic properties during training and can be leveraged as free calibration signals without additional training.
This paper explains why embedding norms (magnitudes) in contrastive models encode semantic information like concept specificity, even though these models use scale-invariant losses that should ignore norms.