Transferability to spectrogram-based anomaly detection: Enhancing audio anomaly detection through vision derived methods

This thesis explores the applicability and effectiveness of state-of-the-art vision-based anomaly detection methods, specifically designed for image data, in the context of industrial audio data using spectrograms.

true" ? copyright : '' }

Completed Master's Thesis

Anomaly detection in industrial audio data is crucial for ensuring smooth manufacturing processes, enabling predictive maintenance and quality control. Despite its importance, audio anomaly detection has received less attention compared to vision-based methods. This thesis explores the applicability and effectiveness of state-of-the-art vision-based anomaly detection methods, specifically designed for image data, in the context of industrial audio data using spectrograms. The research aims to bridge the gap between the two domains by investigating the potential of adapting vision-based approaches to enhance the performance of audio anomaly detection systems in industrial settings.

The study focuses on three key questions: (1) the applicability of vision-based anomaly detection methods to industrial audio data, (2) the impact of replacing the image-based feature extractor with a spectrogram-specific feature extractor (AST transformer), and (3) the effect of fine-tuning the AST transformer on industrial spectrograms. The research employs state-of-the-art anomaly detection models, namely Patchcore, FastFlow, EfficientAD, and Reverse Distillation, and evaluates their performance on the DCASE2020 dataset and a real-world industrial dataset from BMW.

The findings reveal that vision-based anomaly detection methods can be successfully applied to industrial audio data, with varying degrees of performance depending on the dataset, model architecture, and spectrogram representation used. The study identifies key factors that influence the performance of spectrogram anomaly detection and presents several ways to adapt vision-based approaches for use on spectrograms. These adaptations, such as replacing the image-based feature extractor with a spectrogram-specific feature extractor (AST transformer), have shown promising results in enhancing the performance of audio anomaly detection systems. Furthermore, the successful application of these approaches on the BMW dataset demonstrates their potential in real-world production environments, particularly when recordings are made under controlled conditions with minimal variance.

Supervisors

To the top of the page