Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum

15 years 5 months ago

Download www.cs.cmu.edu

In recent years, the ﬁeld of automatic speaker identiﬁcation has begun to exploit high-level sources of speaker-discriminative information, in addition to traditional models of spectral shape. These sources include pronunciation models, prosodic dynamics, pitch, pause, and duration features, phone streams, and conversational interaction. As part of this broader thrust, we explore a new frame-level vector representation of the instantaneous change in fundamental frequency, known as fundamental frequency variation (FFV). The FFV spectrum consists of 7 continuous coefﬁcients, and can be directly modeled in a standard Gaussian mixture model (GMM) framework. Our experiments indicate that FFV features contain useful information for discriminating among speakers, and that model-space combination of FFV and cepstral features outperforms cepstral features alone. In particular, our results on 16kHz Wall Street Journal data show relative reductions in error rate of 54% and 40% for female a...

Kornel Laskowski, Qin Jin

Real-time Traffic

Cepstral Features | FFV Features | FFV Spectrum | ICASSP 2009 | Signal Processing |

claim paper

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICASSP
Authors	Kornel Laskowski, Qin Jin

Sciweavers

Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum

Cepstral Features | FFV Features | FFV Spectrum | ICASSP 2009 | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers