IMA_MATHS_2012.pdf (341.86 kB)
0/0

Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision

Download (341.86 kB)
conference contribution
posted on 02.05.2013 by Muhammad Salman Khan, Ata ur-Rehman, Yanfeng Liang, Mohsen Naqvi, Jonathon Chambers
In this paper a new combination of the model of the interaural spatial cues and a model that utilizes spatial properties of the sources is proposed to enhance speech separation in reverberant environments. The algorithm exploits the knowledge of the locations of the speech sources estimated through vision. The interaural phase difference, the interaural level difference and the contribution of each source to all mixture channels are each modeled as Gaussian distributions in the time-frequency domain and evaluated at individual time-frequency points. An expectation-maximization (EM) algorithm is employed to refine the estimates of the parameters of the models. The algorithm outputs enhanced time-frequency masks that are used to reconstruct individual speech sources. Experimental results confirm that the combined video-assisted method is promising to separate sources in real reverberant rooms.

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Citation

KHAN, M.S. ... et al., 2012. Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision. IN: Proceedings of the 9th IMA International Conference on Mathematics in Signal Processing, Austin Court, Birmingham, UK, 17 - 20 December 2012, pp. 1 - 4.

Publisher

© IMA

Version

AM (Accepted Manuscript)

Publication date

2012

Notes

This is a conference paper. The website is at: http://www.ima.org.uk/activities/publications.cfm

Language

en

Exports