Sciweavers

472 search results - page 21 / 95
» Crawling the Hidden Web
Sort
View
83
Voted
ADCS
2004
14 years 11 months ago
Focused Crawling in Depression Portal Search: A Feasibility Study
Previous work on domain specific search services in the area of depressive illness has documented the significant human cost required to setup and maintain closed-crawl parameters....
Thanh Tin Tang, David Hawking, Nick Craswell, Rame...
SIGMOD
2006
ACM
232views Database» more  SIGMOD 2006»
15 years 9 months ago
To search or to crawl?: towards a query optimizer for text-centric tasks
Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive...
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay ...
WWW
2002
ACM
15 years 10 months ago
Parallel crawlers
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish d...
Junghoo Cho, Hector Garcia-Molina
WWW
2008
ACM
15 years 10 months ago
IRLbot: scaling to 6 billion pages and beyond
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, Dmit...
WWW
2008
ACM
15 years 10 months ago
iRobot: an intelligent crawler for web forums
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...