Agent behavior can be measured and monitored by treating behavioral traits as directions in embedding space and analyzing how configuration file edits move along those directions—enabling one agent to audit another's behavioral changes.
This paper presents a method to track how AI agents change their behavior over time by analyzing edits to their configuration files (skill files, memory files, etc.). The researchers train a linear model to identify behavioral traits as directions in embedding space, then score file edits to measure how much they shift an agent's behavior.