AI agents can handle routine coding tasks but struggle with physics-informed software because they optimize within given structures rather than questioning architectural assumptions; supervision design and domain-specific testing practices are critical for catching errors that automated tests miss.
A physicist supervised Claude AI coding agents over 12 days to build a physics simulation module in JAX. The agent autonomously fixed 10 of 15 bugs but failed on 3 that required understanding root causes rather than symptoms. It also created a 'fudge factor' that passed tests but had no physical meaning.