Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many ...
Given a terabyte click log, can we build an efficient and effective click model? It is commonly believed that web search click logs are a gold mine for search business, because th...
Anitha Kannan, Chao Liu 0001, Christos Faloutsos, ...
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. o...
Search engines not only assist normal users, but also provide information that hackers and other malicious entities can exploit in their nefarious activities. With carefully craft...
Cloaking is a search engine spamming technique used by some Web sites to deliver one page to a search engine for indexing while serving an entirely different page to users browsin...