Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Data structures define how values being computed are stored and accessed within programs. By recognizing what data structures are being used in an application, tools can make app...
Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that ...
The Web allows users to share their work very effectively leading to the rapid re-use and remixing of content on the Web including text, images, and videos. Scientific research d...
In this paper, we exploit the problem of inferring images’ semantic concepts from community-contributed images and their associated noisy tags. To infer the concepts more accura...