LLMs show promise for drug discovery, but RL-based post-training on domain-specific tasks is critical: a smaller model trained this way outperformed much larger untrained models, suggesting a practical path forward for real-world drug design applications.
This paper creates a benchmark of chemistry tasks to test how well large language models can help design new drugs. The researchers test three model families on tasks like predicting molecular properties and designing molecules, then show that reinforcement learning training can significantly boost performance—even making smaller models competitive with frontier models.