This paper introduces a new approach to add fault-tolerance to a fulltext retrieval system. The weighted pattern morphing technique circumvents some of the disadvantages of the wid...
The dramatic growth in the number and size of on-line information sources has fueled increasing research interest in the incremental subspace learning problem. In this paper, we pr...
Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performanc...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...