Informed algorithms for sound source separation in enclosed reverberant environments

Khan, Muhammad Salman

Thesis-2013-Khan.pdf (1.8 MB)

Informed algorithms for sound source separation in enclosed reverberant environments

thesis

posted on 2013-10-14, 10:57 authored by Muhammad Salman Khan

While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses.

History

School

Mechanical, Electrical and Manufacturing Engineering

Publisher

Publication date

2013

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.

EThOS Persistent ID

uk.bl.ethos.588052

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9536135

Usage metrics

Keywords

Sound source separation Reverberation Time-frequency masking Interaural cues Mechanical Engineering not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Informed algorithms for sound source separation in enclosed reverberant environments

History

School

Publisher

Publication date

Notes

EThOS Persistent ID

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports