Current AI agents struggle most with identifying knowledge gaps and formulating the right questions, not just answering them—a shift in bottleneck that suggests we need better ways to help AI systems recognize what they don't know.
This paper introduces SciCrafter, a Minecraft-based benchmark that tests whether AI agents can discover causal rules and apply them to solve increasingly complex problems.