Current unlearning methods are imprecise at targeting specific parameters where knowledge is stored, making them vulnerable to attacks that resurface the data—precise localization matters more than output-level performance.
LACUNA is a new benchmark for testing whether LLM unlearning methods actually erase sensitive data from model parameters or just hide it. The researchers inject fake personal information into specific weights of language models, then check if unlearning methods successfully target those exact parameters.