Sciweavers

INTERSPEECH
2010
12 years 11 months ago
Can conversational word usage be used to predict speaker demographics?
This work surveys the potential for predicting demographic traits of individual speakers (gender, age, education level, ethnicity, and geographic region) using only word usage fea...
Dan Gillick
INTERSPEECH
2010
12 years 11 months ago
Learning speaker normalization using semisupervised manifold alignment
As a child acquires language, he or she: perceives acoustic information in his or her surrounding environment; identifies portions of the ambient acoustic information as languager...
Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin...
INTERSPEECH
2010
12 years 11 months ago
What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering
Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Mul...
Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria...
INTERSPEECH
2010
12 years 11 months ago
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition
We recently proposed a method for HMM adaptation to noisy environments called Linear Spline Interpolation (LSI). LSI uses linear spline regression to model the relationship betwee...
Michael L. Seltzer, Alex Acero
INTERSPEECH
2010
12 years 11 months ago
Fully automatic segmentation for prosodic speech corpora
While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the p...
Sarah Hoffmann, Beat Pfister
INTERSPEECH
2010
12 years 11 months ago
Acoustic feature analysis in speech emotion primitives estimation
We recently proposed a family of robust linear and nonlinear estimation techniques for recognizing the three emotion primitives
Dongrui Wu, Thomas D. Parsons, Shrikanth S. Naraya...
INTERSPEECH
2010
12 years 11 months ago
Setup for acoustic-visual speech synthesis by concatenating bimodal units
This paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concaten...
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent...
INTERSPEECH
2010
12 years 11 months ago
Deep-structured hidden conditional random fields for phonetic recognition
We extend our earlier work on deep-structured conditional random field (DCRF) and develop deep-structured hidden conditional random field (DHCRF). We investigate the use of this n...
Dong Yu, Li Deng
INTERSPEECH
2010
12 years 11 months ago
Recurrent neural network based language model
A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it is possible to obtain around 50% reduct...
Tomas Mikolov, Martin Karafiát, Lukas Burge...