We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recogn...
Due to upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It allows a...
This paper investigates the impact of automatic sentence segmentation on speech summarization using the ICSI meeting corpus. We use a hidden Markov model (HMM) for sentence segmen...
Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fea...
Abdel-rahman Mohamed, Tara N. Sainath, George Dahl...
The automatic recognition of user’s communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past f...
Serdar Yildirim, Shrikanth Narayanan, Alexandros P...