Speaker indexing and speech enhancement in real meetings / conversations

13 years 11 months ago

Download www.tara.tsukuba.ac.jp

This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classiﬁer. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka,

Real-time Traffic

ICASSP 2008 | Signal Processing | Speaker Indexing | Speaker Indexing Information | Speaker Indexing Method |

claim paper

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICASSP
Authors	Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

Sciweavers

Speaker indexing and speech enhancement in real meetings / conversations

ICASSP 2008 | Signal Processing | Speaker Indexing | Speaker Indexing Information | Speaker Indexing Method |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers