Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim et al.|April 23, 2026arXiv

Key Takeaway

LLMs can model human moral reasoning but don't use that understanding in their own decisions—they follow abstract rules instead of social context, creating a dangerous misalignment between their internal understanding and external behavior.

Summary

This study tests whether large language models understand how human morality shifts based on relationships and context. Using a whistleblower dilemma scenario, researchers found that LLMs can predict how humans actually behave (favoring loyalty to friends), but their own decisions follow rigid fairness rules instead.

alignment reasoning evaluation

Key Terms

moral-reasoning reasoning-process world-knowledge alignment