Sciweavers

BIOID
2008

Multimodal Speaker Identification Based on Text and Speech

13 years 6 months ago
Multimodal Speaker Identification Based on Text and Speech
Abstract. This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Melfrequency cepstral coefficients (MFCCs) are extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, that is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied first to the text and the nearest neighbor classifier applied next to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experimen...
Panagiotis Moschonas, Constantine Kotropoulos
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where BIOID
Authors Panagiotis Moschonas, Constantine Kotropoulos
Comments (0)