Sciweavers

22 search results - page 2 / 5
» Efficient URL caching for world wide web crawling
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
13 years 11 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WSC
1997
13 years 6 months ago
Model-Driven Simulation of World-Wide-Web Cache Policies
The World Wide Web (WWW) has experienced a dramatic increase in popularity since 1993. Many reports indicate that its growth will continue at an exponential rate. This growth has ...
Ying Shi, Edward Watson, Ye-Sho Chen
WIDM
2004
ACM
13 years 10 months ago
Probabilistic models for focused web crawling
A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...
Hongyu Liu, Evangelos E. Milios, Jeannette Janssen
CN
1998
118views more  CN 1998»
13 years 4 months ago
Adaptive web caching: towards a new global caching architecture
An adaptive, highly scalable, and robust web caching system is needed to effectively handle the exponential growth and extreme dynamic environment of the World Wide Web. Our work ...
B. Scott Michel, Khoi Nguyen, Adam Rosenstein, Lix...
CN
1999
242views more  CN 1999»
13 years 4 months ago
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom