Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manual...
Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which th...
This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
Current practice of Web site development does not address explicitly the problems related to multilingual sites. The same information, as well as the same navigation paths, page f...
Paolo Tonella, Filippo Ricca, Emanuele Pianta, Chr...
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom