Sciweavers

WWW
2007
ACM
14 years 5 months ago
The discoverability of the web
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...
WWW
2008
ACM
14 years 5 months ago
Mining tag clouds and emoticons behind community feedback
In this paper we describe our mining system which automatically mines tags from feedback text in an eCommerce scenario. It renders these tags in a visually appealing manner. Furth...
Kavita A. Ganesan, Neelakantan Sundaresan, Harshal...
WWW
2008
ACM
14 years 5 months ago
Psst: a web-based system for tracking political statements
Determining candidates' views on important issues is critical in deciding whom to support and vote for; but finding their statements and votes on an issue can be laborious. I...
Samantha Kleinberg, Bud Mishra
WWW
2008
ACM
14 years 5 months ago
IRLbot: scaling to 6 billion pages and beyond
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, Dmit...
WWW
2009
ACM
14 years 5 months ago
Adaptive bidding for display advertising
Motivated by the emergence of auction-based marketplaces for display ads such as the Right Media Exchange, we study the design of a bidding agent that implements a display adverti...
Arpita Ghosh, Benjamin I. P. Rubinstein, Sergei Va...
WWW
2009
ACM
14 years 5 months ago
Graph based crawler seed selection
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
WWW
2009
ACM
14 years 5 months ago
Detecting the origin of text segments efficiently
In the origin detection problem an algorithm is given a set S of documents, ordered by creation time, and a query document D. It needs to output for every consecutive sequence of ...
Ossama Abdel Hamid, Behshad Behzadi, Stefan Christ...
GIS
2004
ACM
14 years 5 months ago
A novel improvement to the R*-tree spatial index using gain/loss metrics
The R*-tree is a state-of-the-art spatial index structure. It has already found its way into commercial systems. The most important improvement of the R*-tree over the original R-...
Donghui Zhang, Tian Xia
GIS
2004
ACM
14 years 5 months ago
Algorithms for the placement of diagrams on maps
This paper discusses a variety of ways to place diagrams like pie charts on maps, in particular, administrative subdivisions. The different ways come from different models of the ...
Étienne Schramm, Alexander Wolff, Marc J. v...