Multilingual corpora are valuable resources for cross-language information retrieval and are available in many language pairs. However the Persian language does not have rich multi...
Although publicly-available, ground-truthed corpora have proven useful for training, evaluating, and comparing recognition systems in many domains, the availability of such corpor...
Scott MacLean, George Labahn, Edward Lank, Mirette...
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude fea...
As metadata is often not sufficiently provided by authors of Learning Resources, automatic metadata generation methods are used to create metadata afterwards. One kind of metadata ...
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains...