WADA-W: A modified WADA SNR estimator for audio-visual speech recognition

Seong, Thum Wei; Ibrahim, MZ; Mulvaney, David

824-CJ2-017.pdf (639.38 kB)

WADA-W: A modified WADA SNR estimator for audio-visual speech recognition

journal contribution

posted on 2019-09-18, 10:42 authored by Thum Wei Seong, MZ Ibrahim, David Mulvaney

One of the main challenges in speech recognition is developing systems that are robust to contamination by intrusive background noise. In audio-visual speech recognition (AVSR), audio information is augmented by visual information in order to help improve the performance of speech recognition, particularly when the audio modality is so significantly corrupted by background noise and it becomes hard to differentiate the original speech signal from the noise. The signal-to-noise ratio (SNR) can be used to identify the level of noise in original speech signal and one widely used method for SNR estimation is waveform amplitude distribution analysis (WADA), which is based on the assumption that the speech and noise signals have Gamma and Gaussian amplitude distributions respectively. Based on previous approaches, this work uses a precomputed look-up table as a reference for SNR estimation. In this study, WADA-white (WADA-W) has been developed, which rebuilds the precomputed look-up table using a white noise profile in combination of our own AVSR database. This new data corpus, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains recordings of 10 speakers with five sets of samples uttered by each speaker is used for this experimental work. We evaluate the performance of WADA-W on this database when it is corrupted by noise generated from three profiles obtained from the NOISEX-92 database included at varying SNR values. Evaluation of performance using the LUNA-V database shows that WADA-W performs better than the original WADA in terms of SNR estimation.

Funding

Ministry of Higher Education Malaysia under FRGS Grant RDU160108

History

School

Mechanical, Electrical and Manufacturing Engineering

Published in

International Journal of Machine Learning and Computing

Volume

9

Issue

4

Pages

446 - 451

Publisher

International Association of Computer Science and Information Technology (IACSIT)

Version

VoR (Version of Record)

Rights holder

Publication date

2019-08-01

Copyright date

2019

DOI

https://doi.org/10.18178/ijmlc.2019.9.4.824

eISSN

2010-3700

Language

en

Depositor

Dr David Mulvaney

Usage metrics

Keywords

Audio visual speech recognition LUNA-V SNR estimator WADA

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

WADA-W: A modified WADA SNR estimator for audio-visual speech recognition

Funding

Ministry of Higher Education Malaysia under FRGS Grant RDU160108

History

School

Published in

Volume

Issue

Pages

Publisher

Version

Rights holder

Publication date

Copyright date

DOI

eISSN

Language

Depositor

Usage metrics

Categories

Keywords

Licence

Exports