LLMs can model human moral reasoning but don't use that understanding in their own decisions—they follow abstract rules instead of social context, creating a dangerous misalignment between their internal understanding and external behavior.
This study tests whether large language models understand how human morality shifts based on relationships and context. Using a whistleblower dilemma scenario, researchers found that LLMs can predict how humans actually behave (favoring loyalty to friends), but their own decisions follow rigid fairness rules instead.