Simply choosing one model out of a large set of possibilities for a given vision task is a surprisingly difficult problem, especially if there is limited evaluation data with whi...
Prosodic information has been successfully used for speaker recognition for more than a decade. The best-performing prosodic system to date has been one based on features extracte...
Luciana Ferrer, Nicolas Scheffer, Elizabeth Shribe...
This paper addresses the problem of using unstructured queries to search a structured database in voice search applications. By incorporating structural information in music metad...
Young-In Song, Ye-Yi Wang, Yun-Cheng Ju, Mike Selt...
Update of acoustic and language models is vital to maintain performance of automatic speech recognition (ASR) systems. To alleviate efforts for updating models, we propose a "...
Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya ...
This paper describes work in progress on automatic generation of "impact sounds" based on physical modelling. These sounds can be used as non-speech audio presentation of...
Alireza Darvishi, Valentin Guggiana, Eugen Muntean...