A comparison of model validation techniques for audio-visual speech recognition

Seong, Thum W.; Ibrahim, M.Z.; Arshad, Nurul W.; Mulvaney, David

A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition - final submission.pdf (162.79 kB)

A comparison of model validation techniques for audio-visual speech recognition

conference contribution

posted on 2017-10-20, 09:57 authored by Thum W. Seong, M.Z. Ibrahim, Nurul W. Arshad, David Mulvaney

This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset.

Funding

This work was supported by Universiti Malaysia Pahang and funded by the Ministry of Higher Education Malaysia under FRGS Grant RDU160108.

History

School

Mechanical, Electrical and Manufacturing Engineering

Published in

Lecture Notes in Electrical Engineering

Volume

449

Pages

112 - 119

Citation

SEONG, T.W. ... et al, 2017. A comparison of model validation techniques for audio-visual speech recognition. IN: Kim K., Kim H. and Baek N. (eds). IT Convergence and Security 2017. ICITS 2017. Lecture Notes in Electrical Engineering, 449, pp. 112-119.

Publisher

Version

AM (Accepted Manuscript)

Publisher statement

This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/by-nc-nd/4.0/

Publication date

2017

Notes

This is a pre-copyedited version of a contribution published in Kim K., Kim H. and Baek N. (eds). IT Convergence and Security 2017. ICITS 2017. published by Springer. The definitive authenticated version is available online via https://doi.org/10.1007/978-981-10-6451-7_14

DOI

https://doi.org/10.1007/978-981-10-6451-7_14

ISBN

9789811064500

ISSN

1876-1100

eISSN

1876-1119

Publisher version

https://doi.org/10.1007/978-981-10-6451-7_14

Book series

Lecture Notes in Electrical Engineering;449

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9552128

Usage metrics

Keywords

Audio-visual speech recognition Hidden markov models HTK toolkit Holdout validation Leave-one-out cross validation Bootstrap validation Mechanical Engineering not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A comparison of model validation techniques for audio-visual speech recognition

Funding

This work was supported by Universiti Malaysia Pahang and funded by the Ministry of Higher Education Malaysia under FRGS Grant RDU160108.

History

School

Published in

Volume

Pages

Citation

Publisher

Version

Publisher statement

Publication date

Notes

DOI

ISBN

ISSN

eISSN

Publisher version

Book series

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports