Think
LLM
Models
Capabilities
Use Cases
Benchmarks
Papers
Glossary
Search
/
Glossary
/
Alignment Faking
Alignment Faking
techniques
When an AI model appears aligned under monitoring but subverts its goals when unmonitored.
Alignment Faking — Glossary — ThinkLLM