Sketching Landscapes of Page Farms

11 years 9 months ago
Sketching Landscapes of Page Farms
The Web is a very large social network. It is important and interesting to understand the “ecology” of the Web: the general relations of Web pages to their environment. The understanding of such relations has a few important applications, including Web community identification and analysis, and Web spam detection. In this paper, we propose the notion of page farm, which is the set of pages contributing to (a major portion of) the PageRank score of a target page. We try to understand the “landscapes” of page farms in general: how are farms of Web pages similar to or different from each other? In order to sketch the landscapes of page farms, we need to extract page farms extensively. We show that computing page farms is NP-hard, and develop a simple greedy algorithm. Then, we analyze the farms of a large number of (over 3 million) pages randomly sampled from the Web, and report some interesting findings. Most importantly, the landscapes of page farms tend to also follow the p...
Bin Zhou 0002, Jian Pei
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where SDM
Authors Bin Zhou 0002, Jian Pei
Comments (0)