LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers|July 2, 2026arXiv

Key Takeaway

Current unlearning methods are imprecise at targeting specific parameters where knowledge is stored, making them vulnerable to attacks that resurface the data—precise localization matters more than output-level performance.

Summary

LACUNA is a new benchmark for testing whether LLM unlearning methods actually erase sensitive data from model parameters or just hide it. The researchers inject fake personal information into specific weights of language models, then check if unlearning methods successfully target those exact parameters.

safety evaluation training

Key Terms

machine-unlearning pii-detection mechanistic-interpretability resurfacing-attacks