Video-aided model-based source separation in real reverberant rooms

Khan, Muhammad Salman; Naqvi, Mohsen; ur-Rehman, Ata; Wang, Wenwu; Chambers, Jonathon

Video-aided model-based-Khan et al.pdf (2.35 MB)

Video-aided model-based source separation in real reverberant rooms

journal contribution

posted on 2013-05-02, 08:31 authored by Muhammad Salman Khan, Mohsen Naqvi, Ata ur-Rehman, Wenwu Wang, Jonathon Chambers

Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete timefrequency points. The model parameters are refined with the wellknown expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better timefrequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited.

History

School

Mechanical, Electrical and Manufacturing Engineering

Citation

KHAN, M.S. ... et al., 2013. Video-aided model-based source separation in real reverberant rooms. IEEE Transactions on Audio, Speech and Language Processing, 21 (9), pp. 1900 - 1912.

Publisher

Version

AM (Accepted Manuscript)

Publication date

2013

Notes

This article was published in the journal, IEEE Transactions on Audio, Speech and Language Processing [© IEEE] and the definitive version is available at: http://dx.doi.org/10.1109/TASL.2013.2261814 [Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.]