A variety of lossless compression schemes have been proposed to reduce the storage requirements of web graphs. One successful approach is virtual node compression [7], in which of...
We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extreme...
The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between o...
Pradeep Muthukrishnan, Dragomir R. Radev, Qiaozhu ...
Several algorithms based on link analysis have been developed to measure the importance of nodes on a graph such as pages on the World Wide Web. PageRank and HITS are the most pop...
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen