Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
We describe an approach for constructing search spaces that consist of highly relevant web pages using similarities between the contents of linked web pages to represent their lin...
Aki Kobayashi, Kuangmin Tan, Katsunori Yamaoka, Yo...
The competition on clients attention requires sites to update their content frequently. As a result, a large percentage of web pages are semi-dynamic, i.e., change quite often and...
Danny Dolev, Osnat Mokryn, Yuval Shavitt, Innocent...
Realtime web search refers to the retrieval of very fresh content which is in high demand. An effective portal web search engine must support a variety of search needs, including ...
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...