Green Shielding: A User-Centric Approach Towards Trustworthy AI

Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains et al.|April 27, 2026arXiv

Key Takeaway

How users phrase queries matters as much as what they ask: benign input variations systematically change AI behavior in ways that matter for real-world deployment, especially in high-stakes domains like healthcare.

Summary

This paper shows that small, routine changes in how users phrase questions to AI models can significantly shift their outputs—a problem existing safety testing misses.

safety evaluation applications

Key Terms

red-teaming prompt-engineering differential-diagnosis pareto-frontier