Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of ot...
This paper describes DUTIR at TREC 2007 Blog Track. In data preprocessing, a non English language list created from the corpus was used to remove the non English blogs, blog templ...
Rui Song, Qin Tang, Daming Shi 0002, Hongfei Lin, ...
The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of comm...
Xuerui Wang, Andrei Z. Broder, Evgeniy Gabrilovich...
As the number of non-English documents is increasing dramatically on the web nowadays, the study and design of information retrieval systems for these languages is very important....
Abolfazl AleAhmad, Hadi Amiri, Masoud Rahgozar, Fa...
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...