Targeted Unlearning with Single Layer Unlearning Gradient

1University of California, Riverside
2University of Maryland, College Park

Abstract

The unauthorized generation of privacy-related and copyright-infringing content using generative-AI is becoming a significant concern for society, raising ethical, legal, and privacy issues that demand urgent attention. Recently, machine unlearning techniques have arisen that attempt to eliminate the influence of sensitive content used during model training, but they often require extensive updates in the model, reduce the utility of the models for unrelated content, and/or incur substantial computational costs. In this work, we propose a novel and efficient method called Single Layer Unlearning Gradient (SLUG), that can unlearn targeted information by updating a single targeted layer of a model using a one-time gradient computation. We introduce two metrics: layer importance and gradient alignment, to identify the appropriate layers for unlearning targeted information. Our method is highly modular and enables selective removal of multiple concepts from the generated outputs of widely used foundation models (e.g., CLIP), generative models (e.g., Stable Diffusion) and Vision-Language models. Our method shows effectiveness on a broad spectrum of concepts ranging from concrete (e.g., celebrity name, intellectual property figure, and object) to abstract (e.g., novel concept and artistic style).

Background

Modern Generative AI raises significant concerns, including privacy violations of celebrities, copyright-infringing content generation, artistic style plagiarism, and the creation of unsafe-for-work content. Machine Unlearning emerges as a promising solution, facing three core challenges: 1) removing unwanted concepts from models effectively, 2) retaining the model's utility to preserve functionality, and 3) ensuring computational efficiency to minimize resource demands.

Existing methods struggle to meet all three challenges. Retraining the model from scratch on a scrutinized dataset achieves exact unlearning but is computationally expensive and inflexible for new unlearning requests. Gradient ascent updates model weights in a reverse direction relative to the target concept, which can unlearn effectively but risks over-unlearning and utility degradation. Saliency-based methods identify and update only critical model weights, using thresholds informed by forget-loss gradients. While balancing unlearning and utility retention, these methods are computationally intensive due to iterative gradient calculations and require extensive hyperparameter tuning.

In contrast, SLUG pushes the saliency-based approach to new levels of efficiency. It requires only a single gradient calculation and a one-step update to a single layer to achieve effective unlearning, dramatically reducing computational costs while maintaining performance.

Our Method: Single Layer Unlearning Gradient

Framework image.

Fig. 1: The unlearning framework of our proposed method, Single Layer Unlearning Gradient (SLUG).


Given an unlearning query, such as removing an identity like "Elon Musk", we first curate or generate a forget set containing relevant data and a retain set with data points we want to preserve. Using these datasets, we calculate and store the model gradients. Based on these gradients, we identify the important layers to update for unlearning. We then take a step along the forget gradients of a single layer and evaluate the model's unlearning performance. To determine a suitable step size λ, we employ a binary search. After unlearning, the specified concepts are effectively erased while retaining the model's overall utility.

How does Single Layer Update Work?

Our extensive analysis on CLIP zero-shot classification demonstrate that a single unlearning update on one layer, which is identified by our layer importance and gradient alignment metrics, is sufficient to make the model forget a targeted concept while preserving zero-shot classification accuracy close to that of the original CLIP model.

Teaser image.

Fig. 2: Pareto-fronts and step-size analysis of one-step update the vision/language parts of CLIP.


Main takeaway: with a properly selected unlearning step size, one-step update on one of the pareto-optimal layers, in terms of high concept importance and low forget-retain gradient alignment, can achieve good unlearning and utility retention.

Results

Despite being extremely efficient, SLUG achieves effective unlearning while maintaining a good balance with model utility retention. Besides, SLUG is scalable across different foundation models for different tasks (e.g., CLIP, Stable Diffusion, VLMs) and flexible enough to jointly unlearn multiple identities. Below, we present some of the remarkable results achieved by SLUG.

Examples on CLIP zero-shot classification

CLIP cosine similarity matrix.

Fig.3: CLIP original image-text conine similarity matrix.

CLIP cosine similarity matrix.

Fig.4: Unlearning "Elon Musk".

CLIP cosine similarity matrix.

Fig.5: Unlearning "Elon Musk" and "Mark Zukerburg".

Main takeaway: SLUG can effectively unlearn multiple targeted identities from CLIP. By updating a selected layer with a single gradient for each distinct identity, it introduces modularity into the unlearning process.

Examples on Stable Diffusion Image Generation

Stable Diffusion example: copyright.

Fig. 6: Unlearning copyright-protected intellectual property ("Mickey Mouse" and "Iron Man") from Stable Diffusion-v2.1 model.

After unlearning with SLUG, Stable Diffusion fails to generate images associated with the targeted copyright-protected figures, while the overall image generation utility of the original model is largely preserved.

Examples on Vision-Language Models

Vision-Language Model.

Fig. 7: Unlearning the celebrity example "Elon Musk" from LLaVA-v1.5-7B model.


Vision-Language Model.

Fig. 8: Unlearning the celebrity example "Taylor Swift" from LLaVA-v1.5-7B model.

While targeted identities are mapped to wrong name or gender after the unlearning, the other celebrities identification remain unaffected. Besides, model's robustness against style distribution shift is also preserved. SLUG can effectively unlearn targeted identities while preserving the model’s utility on vision-language tasks, maintains high accuracy and functionality across a range of tasks, ensuring minimal impact on the model's overall utility.

More Results

Comprehensive results on quantitative evaluations, multi-concept unlearning, unlearning of different concepts, and additional qualitative samples for Stable Diffusion and VLMs can be found in the main text and the supplementary material of our paper.

BibTeX

If you find our work helpful for your research, please consider citing us!


      @article{cai2024unlearning,
        title={Unlearning Targeted Information via Single Layer Unlearning Gradient},
        author={Cai, Zikui and Tan, Yaoteng and Asif, M Salman},
        journal={arXiv preprint arXiv:2407.11867},
        year={2024}
      }