Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

Ibrahim, M.Z.; Mulvaney, David; Abas, M.F.

File(s) under permanent embargo

Reason: This item is currently closed access.

Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

journal contribution

posted on 2016-12-15, 10:02 authored by M.Z. Ibrahim, David Mulvaney, M.F. Abas

Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.

History

School

Mechanical, Electrical and Manufacturing Engineering

Published in

ARPN Journal of Engineering and Applied Sciences

Volume

10

Issue

23

Pages

17521 - 17527

Citation

IBRAHIM, M.Z., MULVANEY, D.J. and ABAS, M.F., 2015. Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment. ARPN Journal of Engineering and Applied Sciences, 10(23), pp. 17521-17527.

Publisher

Version

VoR (Version of Record)

Publisher statement

This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/by-nc-nd/4.0/

Publication date

2015

Notes

This paper is in closed access.

eISSN

1819-6608

Publisher version

http://www.arpnjournals.com/jeas/index.htm

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9547421

Usage metrics

Keywords

Lip geometry Feature fusion Audio-visual speech recognition OpenCV Mechanical Engineering not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

History

School

Published in

Volume

Issue

Pages

Citation

Publisher

Version

Publisher statement

Publication date

Notes

eISSN

Publisher version

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports