Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision

Khan, Muhammad Salman; ur-Rehman, Ata; Liang, Yanfeng; Naqvi, Mohsen; Chambers, Jonathon

IMA_MATHS_2012.pdf (341.86 kB)

Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision

conference contribution

posted on 2013-05-02, 08:29 authored by Muhammad Salman Khan, Ata ur-Rehman, Yanfeng Liang, Mohsen Naqvi, Jonathon Chambers

In this paper a new combination of the model of the interaural spatial cues and a model that utilizes spatial properties of the sources is proposed to enhance speech separation in reverberant environments. The algorithm exploits the knowledge of the locations of the speech sources estimated through vision. The interaural phase difference, the interaural level difference and the contribution of each source to all mixture channels are each modeled as Gaussian distributions in the time-frequency domain and evaluated at individual time-frequency points. An expectation-maximization (EM) algorithm is employed to refine the estimates of the parameters of the models. The algorithm outputs enhanced time-frequency masks that are used to reconstruct individual speech sources. Experimental results confirm that the combined video-assisted method is promising to separate sources in real reverberant rooms.

History

School

Mechanical, Electrical and Manufacturing Engineering

Citation

KHAN, M.S. ... et al., 2012. Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision. IN: Proceedings of the 9th IMA International Conference on Mathematics in Signal Processing, Austin Court, Birmingham, UK, 17 - 20 December 2012, pp. 1 - 4.

Publisher

Version

AM (Accepted Manuscript)

Publication date

2012

Notes

This is a conference paper. The website is at: http://www.ima.org.uk/activities/publications.cfm

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9555662

Usage metrics

Keywords

Speech separation Reverberation Spatial cues Expectation-maximization Time-frequency masking Mechanical Engineering not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision

History

School

Citation

Publisher

Version

Publication date

Notes

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports