Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
We propose a novel conception language for exploring the results retrieved by several internet search services (like search engines) that cluster retrieved documents. The goal is ...
Gloria Bordogna, Alessandro Campi, Giuseppe Psaila...
Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web ...
The link structure of the Web graph is used in algorithms such as Kleinberg’s HITS and Google’s PageRank to assign authoritative weights to Web pages and thus rank them. Both ...
Despite the extensive use of caching techniques, the Web is overloaded. While the caching techniques currently used help some, it would be better to use different caching and repli...
Anne-Marie Kermarrec, Ihor Kuz, Maarten van Steen,...