Eran1.pdf (1.06 MB)
Download file

Real-time speaker identification for video conferencing

Download (1.06 MB)
conference contribution
posted on 14.07.2010, 15:15 by Sara SaraviSara Saravi, Iffat Zafar, Eran Edirisinghe, Roy KalawskyRoy Kalawsky
Automatic speaker identification in a videoconferencing environment will allow conference attendees to focus their attention on the conference rather than having to be engaged manually in identifying which channel is active and who may be the speaker within that channel. In this work we present a real-time, audio-coupled video based approach to address this problem, but focus more on the video analysis side. The system is driven by the need for detecting a talking human via the use of computer vision algorithms. The initial stage consists of a face detector which is subsequently followed by a lip-localization algorithm that segments the lip region. A novel approach for lip movement detection based on image registration and using the Coherent Point Drift (CPD) algorithm is proposed. Coherent Point Drift (CPD) is a technique for rigid and non-rigid registration of point sets. We provide experimental results to analyse the performance of the algorithm when used in monitoring real life videoconferencing data.



  • Science


  • Computer Science


SARAVI, al., 2010. Real-time speaker identification for video conferencing. IN: Kehtarnavaz, N. (ed.), Real-Time Image and Video Processing 2010, Proceedings of SPIE, 7724, 77240D, 10pp.


© 2010 SPIE


VoR (Version of Record)

Publication date



Copyright 2010 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic electronic or print reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited. This paper can also be found at: