Sciweavers

4 search results - page 1 / 1
» Efficient Crawling Through URL Ordering
Sort
View
CN
1998
54views more  CN 1998»
13 years 4 months ago
Efficient Crawling Through URL Ordering
In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can ...
Junghoo Cho, Hector Garcia-Molina, Lawrence Page
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 5 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
IC
2009
13 years 2 months ago
Language Based Crawling: Crawling the Arabic Content of the Web
- Crawling web pages written in Arabic or any other language with limited content in the web may, at first, seem to parallel the process of crawling the English content. However, t...
Saad H. Alabbad, Sultan Alanazi
WWW
2005
ACM
14 years 5 months ago
The infocious web search engine: improving web searching through linguistic analysis
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
Alexandros Ntoulas, Gerald Chao, Junghoo Cho