Detecting objects in images using arbitrary text descriptions rather than a fixed set of predefined categories.