Background: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships fr...
In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of th...
Wrappers play an important role in extracting specified information from various sources. Wrapper rules by which information is extracted are often created from the domain-specifi...
The amount of information available online has grown enormously over the past decade. Fortunately, computing power, disk capacity, and network bandwidth have also increased dramat...
Sergey Brin, Rajeev Motwani, Lawrence Page, Terry ...
Most of the current WWW is made up of dynamic pages. The development of dynamic pages is a difficult and costly endeavour, out-of-reach for most users, experts, and content produce...