By generating diverse prompts rather than diverse images from one prompt, you can create navigable design spaces where each variation is semantically meaningful and user-understandable, rather than random visual differences.
This paper solves the diversity problem in text-to-image generation by shifting variation from the model's random sampling to the text prompt itself.