The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neig...
The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, d...
In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive’s Wayback Machine can be used to “lazily preserve” websites and...
The purpose of this paper is threefold. First, we study the evolution of the web based on data available from an earlier snapshot of the web and compare the results with those pre...
Wei-Tsen Milly Chiang, Markus Hagenbuchner, Ah Chu...
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...