Loughborough University
Browse

Machine learning based approaches to automatic stuttering event detection

Download (6.21 MB)
thesis
posted on 2023-11-16, 15:39 authored by Abedal-karim Al-Banna

Stuttering is a speech fluency disorder affecting 1% of the global population. To provide an automatic and objective stuttering assessment tool, the subject area of Stuttering Event Detection (SED) is under extensive investigation in advanced speech research and applications. Despite significant progress achieved by various Machine Learning (ML) and Deep Learning (DL) models, SED directly from speech signals requires to be improved due to the heterogeneous and overlapped nature of stuttering speech. With the key focus of enhancing the state-of-the-art of research in SED the primary goal of this thesis is to investigate different ML / DL and feature engineering techniques for robust SED based on acoustic features that directly detect stuttering events from speech signals.

The first part of this thesis demonstrates the capabilities of different DL approaches such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architectures and hybrid approaches, such as ConvLSTM against experimental data. Moreover, this work suggests and evaluates a novel SED model architecture that detects stuttering events directly from speech signals, using a log mel spectrogram as a sole acoustic feature and a 2D atrous convolutional network to learn spectral and temporal feature representations. While the proposed DL approach has shown promising results in SED, improving model generalisation and robustness using cross-datasets and domain adaptation, and evaluating the performance against traditional ML approaches in SED, is vital.

Therefore, the second part of this thesis investigates the impact of stuttering event representation on detection performance. This part starts by rigorously investigating the effective use of eight common ML classifiers on two publicly available large-scale datasets to automatically detect stuttering events using multiple objective metrics ( prediction accuracy, recall, precision, and F1 score). In addition, this part evaluates the performance of SED and observes the impact of applying ASR pre-trained features on each stuttering event. Moreover, different experiments evaluate the impact of three time-domain features, zero crossing rate, the auto-correlation of spectral flux onset, and fundamental frequency features, on SED performance. These experiments prove that using contextual information, pre-trained models, and time-domain features, helps improve SED performance.

Finally, the thesis proposes an attention-based multi-feature DL model for stuttering event detection using a Convolutional Block Attention Module (CBAM). A novel attention-based model is proposed to effectively learn frame-level and temporal representations by considering contextual, pitch, time-domain and auditory-based spectral features. The multi-feature fusion approach, using time-domain features (zero crossing rate, spectral flux onset strength envelope, fundamental frequency), automatic speech recognition embeddings, and auditory-based spectral features, is capable of improving SED performance significantly, outperforming state-of-the-art methods. In addition, a convolutional block with attention maps along two separate dimensions based on CBAM is introduced for SED. This contribution demonstrates the effectiveness of this lightweight module in performing automatic feature selection by assigning shared weights to the intermediate feature map and focusing on the salient features of speech regions, leading to improved performance in SED.

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Abedal-kareem M Al-banna

Publication date

2023

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

H Fang ; CW Dawson

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Computer Science Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC