We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and ...
We explore morphology-based and sub-word language modeling approaches proposed for morphologically rich languages, and evaluate and contrast them for Turkish broadcast news transc...
A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communica...
Toomas Altosaar, Louis ten Bosch, Guillaume Aimett...
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, ...
Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hy...