Sciweavers

1098 search results - page 130 / 220
» Compressed web indexes
Sort
View
CN
2002
116views more  CN 2002»
14 years 9 months ago
ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches
This paper describes the design and use of a synthetic Web proxy workload generator called ProWGen to investigate the sensitivity of Web proxy cache replacement policies to five se...
Mudashiru Busari, Carey L. Williamson
EDBTW
2010
Springer
14 years 8 months ago
Using visual pages analysis for optimizing web archiving
Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers ha...
Myriam Ben Saad, Stéphane Gançarski
AIRWEB
2007
Springer
15 years 4 months ago
A Taxonomy of JavaScript Redirection Spam
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
Kumar Chellapilla, Alexey Maykov
WWW
2006
ACM
15 years 3 months ago
Do not crawl in the DUST: different URLs with similar text
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar
APWEB
2004
Springer
15 years 1 months ago
A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. o...
Shaozhi Ye, Ruihua Song, Ji-Rong Wen, Wei-Ying Ma