Combining reasoning-based and learning-based simulation through a shared policy layer reduces errors by ~45%, showing that hybrid approaches work better than either method alone for predicting real-world user behavior.
This paper presents a system for simulating how groups of users behave on a food delivery platform (Meituan) to test merchant strategies without real experiments. It combines two approaches—one that reasons through decisions logically and another that learns statistical patterns—using shared decision policies as a bridge between them.