Sciweavers

1109 search results - page 16 / 222
» Crawling on web graphs
Sort
View
WWW
2003
ACM
16 years 2 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
IR
2008
15 years 1 months ago
Focused web crawling in the acquisition of comparable corpora
CLIR resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this p...
Tuomas Talvensaari, Ari Pirkola, Kalervo Järv...
CIKM
2010
Springer
14 years 10 months ago
Crawling the web for structured documents
Structured Information Retrieval is gaining a lot of interest in recent years, as this kind of information is becoming an invaluable asset for professional communities such as Sof...
Julián Urbano, Juan Loréns, Yorgos A...
HICSS
1999
IEEE
178views Biometrics» more  HICSS 1999»
15 years 6 months ago
Collaborative Web Crawling: Information Gathering/Processing over Internet
The main objective of the IBM Grand Central Station (GCS) is to gather information of virtually any type of formats (text, data, image, graphics, audio, video) from the cyberspace...
Shang-Hua Teng, Qi Lu, Matthias Eichstaedt, Daniel...
WWW
2009
ACM
16 years 2 months ago
Crawling English-Japanese person-name transliterations from the web
Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese person-name transliterations f...
Satoshi Sato