Although the availability of large video corpora are on the rise, the value of these datasets remain largely untapped due to the difficulty of analyzing their contents. Automatic ...
We propose new methods to exploit contemporaneous text, such as on-line news articles, to improve language models for automatic speech recognition and other natural language proce...
This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt pa...
Statistical learning methods are commonly applied in content-based video and image retrieval. Such methods require a large number of examples which are usually obtained through a ...
Timo Volkmer, James A. Thom, Seyed M. M. Tahaghogh...
The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...