This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
ization method is an abstract function that transforms a scientific dataset into a visual representation to facilitate data exploration. In turn, a visualization display is the vis...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly re...
Jian Hu, Gang Wang, Frederick H. Lochovsky, Jian-T...
Abstract. Tracing traffic using commodity hardware in contemporary highspeed access or aggregation networks such as 10-Gigabit Ethernet is an increasingly common yet challenging t...