The production of rich multilingual speech corpus resources on a large scale is a requirement for many linguistic, phonetic and technological tasks, in both research and applicati...
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. Th...
Zach Solan, David Horn, Eytan Ruppin, Shimon Edelm...
Current statistical machine translation (SMT) systems are trained on sentencealigned and word-aligned parallel text collected from various sources. Translation model parameters ar...
Spyros Matsoukas, Antti-Veikko I. Rosti, Bing Zhan...
Ontologies in current computer science parlance are computer based resources that represent agreed domain semantics. This paper first introduces ontologies in general and subseque...
Marie-Laure Reinberger, Peter Spyns, Walter Daelem...
Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We p...