Sciweavers

14 search results - page 2 / 3
» Can chinese web pages be classified with english data source
Sort
View
LREC
2010
172views Education» more  LREC 2010»
13 years 6 months ago
Evaluating Utility of Data Sources in a Large Parallel Czech-English Corpus CzEng 0.9
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...
Ondrej Bojar, Adam Liska, Zdenek Zabokrtský
LREC
2008
108views Education» more  LREC 2008»
13 years 6 months ago
A Lightweight and Efficient Tool for Cleaning Web Pages
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Stefan Evert
ACL
2008
13 years 6 months ago
Mining Parenthetical Translations from the Web by Word Alignment
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...
IRI
2008
IEEE
13 years 11 months ago
Curate a transliteration corpus from transliteration/translation pairs
Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since dif...
Shih-Hung Wu, Yu-Te Li
CIKM
2008
Springer
13 years 7 months ago
Cross-lingual query classification: a preliminary study
The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of comm...
Xuerui Wang, Andrei Z. Broder, Evgeniy Gabrilovich...