Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
It has long been recognized that capturing term relationships is an important aspect of information retrieval. Even with large amounts of data, we usually only have significant ev...
Web systems suffer from an inability to satisfy heterogeneous needs of many users. A remedy for the negative effects of the traditional "one-size-fits-all'' approac...
Grid is a promising e-Science infrastructure that promotes and facilitates the sharing and collaboration in the use of distributed heterogeneous resources through Virtual Organiza...
Peisheng Zhao, Aijun Chen, Yang Liu, Liping Di, We...
The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of comm...
Xuerui Wang, Andrei Z. Broder, Evgeniy Gabrilovich...