AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

Zhen-Hao Xie, Yu-Cheng Shi, Da-Wei Zhou|May 27, 2026arXiv

Key Takeaway

When fine-tuning CLIP for new classes, you need to separately stabilize how the model extracts features (what it looks for) and how it combines them (how it makes decisions), not just update the whole model at once.

Summary

AREA solves catastrophic forgetting in CLIP-based class-incremental learning by decomposing classification into two stages: extracting visual/textual attributes and aggregating them. It stabilizes attribute extraction using geometric analysis on embedding spaces and learns task-specific experts for aggregation, preventing new classes from overwriting knowledge of old ones.

training efficiency

Key Terms

class-incremental-learning catastrophic-forgetting principal-geodesic-analysis optimal-transport variational-information-bottleneck