Parallel crawling for online social networks

14 years 5 months ago

Download www2007.org

Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independently? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled. Categories and Subject Descriptors D.2.11 [Software]: Software Architecture; H.2 [Information Systems]: Information Storage and Retrieval General Terms Performance Keywords Web Crawler, Web Spider, Parallelization, Online Social Networks

Duen Horng Chau, Shashank Pandit, Samuel Wang, Chr

Real-time Traffic

Internet Technology | Online Social Networks | Parallel Crawlers | Redundant Crawling | WWW 2007 |

claim paper

Related Content

» Crawling Online Social Graphs

» Characterizing user behavior in online social networks

» Measurement and analysis of online social networks

» Online Sampling of High Centrality Individuals in Social Networks

» Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Onli...

» Understanding latent interactions in online social networks

» Understanding online social network usage from a network perspective

» Analysis of topological characteristics of huge online social networking services

» Local Algorithms for Finding Interesting Individuals in Large Networks

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Duen Horng Chau, Shashank Pandit, Samuel Wang, Christos Faloutsos

Comments (0)

Sciweavers

Parallel crawling for online social networks

Internet Technology | Online Social Networks | Parallel Crawlers | Redundant Crawling | WWW 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers