Study of video assisted BSS for convolutive mixtures

In this paper we present an overview of recent research in the area of audio-visual blind source separation (BSS), together with new results of our work that highlight the advantage of including visual information into a BSS algorithm. In our work the visual information is combined with audio information to form joint audio-visual feature vectors. The audio-visual coherence is then modelled using statistical models. The outputs of these models are used within a frequency domain BSS algorithm to control the step size. Experimental results verify the improvement of the audio-visual method compared to audio only BSS. We also discuss visual feature extraction techniques, along with several recently published methods for audio-visual BSS, and conclude with suggestions for future research.