Loughborough University
Browse

Spectrogram transformers for audio classification

Download (2.35 MB)
conference contribution
posted on 2022-09-23, 08:52 authored by Yixiao Zhang, Baihua LiBaihua Li, Hui FangHui Fang, Qinggang MengQinggang Meng
Audio classification is an important task in the machine learning field with a wide range of applications. Since the last decade, deep learning based methods have been widely used and the transformer-based models are becoming new paradigm for audio classification. In this paper, we present Spectrogram Transformers, which are a group of transformer-based models for audio classification. Based on the fundamental semantics of audio spectrogram, we design two mechanisms to extract temporal and frequency features from audio spectrogram, named time-dimension sampling and frequency-dimension sampling. These discriminative representations are then enhanced by various combinations of attention block architectures, including Tempo-ral Only (TO) attention, Temporal-Frequency sequential (TFS) attention, Temporal-Frequency Parallel (TFP) attention, and Two-stream Temporal-Frequency (TSTF) attention, to extract the sound record signatures to serve the classification task. Our experiments demonstrate that these Transformer models outper-form the state-of-the-art methods on ESC-50 dataset without pre-training stage. Furthermore, our method also shows great efficiency compared with other leading methods.

Funding

China Scholarship Council

Loughborough University

History

School

  • Science

Department

  • Computer Science

Published in

2022 IEEE International Conference on Imaging Systems and Techniques (IST)

Source

2022 IEEE International Conference on Imaging Systems and Techniques (IST)

Publisher

IEEE

Version

  • AM (Accepted Manuscript)

Rights holder

© IEEE

Publisher statement

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Publication date

2022-07-20

Copyright date

2022

ISBN

9781665481021

Language

  • en

Location

Kaohsiung, Taiwan

Event dates

21st June 2022 - 23rd June 2022

Depositor

Dr Hui Fang. Deposit date: 22 September 2022

Usage metrics

    Loughborough Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC