Exchanging information between different input types (text and vision) to guide compression decisions.