This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Because of the high volume and unpredictable arrival rate, stream processing systems may not always be able to keep up with the input data streams-- resulting in buffer overflow a...
The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embed...
Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen,...
Web-based communities have become important places for people to seek and share expertise. We find that networks in these communities typically differ in their topology from other...
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...