posted on 2013-05-02, 08:29authored byMuhammad Salman Khan, Ata ur-Rehman, Yanfeng Liang, Mohsen Naqvi, Jonathon Chambers
In this paper a new combination of the model of the
interaural spatial cues and a model that utilizes spatial properties
of the sources is proposed to enhance speech separation in
reverberant environments. The algorithm exploits the knowledge
of the locations of the speech sources estimated through vision.
The interaural phase difference, the interaural level difference
and the contribution of each source to all mixture channels are
each modeled as Gaussian distributions in the time-frequency
domain and evaluated at individual time-frequency points. An
expectation-maximization (EM) algorithm is employed to refine
the estimates of the parameters of the models. The algorithm outputs
enhanced time-frequency masks that are used to reconstruct
individual speech sources. Experimental results confirm that the
combined video-assisted method is promising to separate sources
in real reverberant rooms.
History
School
Mechanical, Electrical and Manufacturing Engineering
Citation
KHAN, M.S. ... et al., 2012. Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision. IN: Proceedings of the 9th IMA International Conference on Mathematics in Signal Processing, Austin Court, Birmingham, UK, 17 - 20 December 2012, pp. 1 - 4.