Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...
In this paper we describe our mining system which automatically mines tags from feedback text in an eCommerce scenario. It renders these tags in a visually appealing manner. Furth...
Kavita A. Ganesan, Neelakantan Sundaresan, Harshal...
Determining candidates' views on important issues is critical in deciding whom to support and vote for; but finding their statements and votes on an issue can be laborious. I...
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Motivated by the emergence of auction-based marketplaces for display ads such as the Right Media Exchange, we study the design of a bidding agent that implements a display adverti...
Arpita Ghosh, Benjamin I. P. Rubinstein, Sergei Va...
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
In the origin detection problem an algorithm is given a set S of documents, ordered by creation time, and a query document D. It needs to output for every consecutive sequence of ...
Ossama Abdel Hamid, Behshad Behzadi, Stefan Christ...
The R*-tree is a state-of-the-art spatial index structure. It has already found its way into commercial systems. The most important improvement of the R*-tree over the original R-...
This paper discusses a variety of ways to place diagrams like pie charts on maps, in particular, administrative subdivisions. The different ways come from different models of the ...