Data Attribution for Diffusion Models

Machine Learning for Simulation Science

Master Thesis, Ms. Tanja Bien

Abstract

Diffusion models have demonstrated a remarkable ability to generate photorealistic images. However, it is difficult to explain what causes the generated image. Tracing the output back to the training data and identifying the most influential examples is necessary to debug the model, find biases, or provide fair compensation to contributors. While data mapping methods have been extensively studied in the supervised setting, data mapping for generative models such as diffusion models remains a challenge. The aim of this paper is to provide an overview of existing methods for data mapping. In the absence of a commonly used benchmark, various experiments allow a comparison between the different methods.

To the top of the page