Current embodied systems struggle with the full loop: even when vision models perform well on isolated tasks (67% accuracy), they fail at recovering complete game state needed for decision-making (34% accuracy), and execution errors cascade during real deployment.
DexHoldem is a real-world benchmark that tests embodied AI systems playing Texas Hold'em with a dexterous robot hand. It combines three challenges: executing 14 card-manipulation skills precisely, perceiving game state from images, and making decisions based on that perception—revealing how errors compound when all three run together in closed-loop control.