RFM-AGOP enables rapid identification of multi-dimensional safety subspaces in LLMs, offering a computationally efficient alternative to existing methods that could scale safety monitoring across larger models.
This paper presents a fast method for identifying multi-dimensional refusal subspaces in large language models using an adapted Recursive Feature Machine (RFM) algorithm.