Sciweavers

563 search results - page 1 / 113
» Crawling the web for structured documents
Sort
View
CIKM
2010
Springer
13 years 1 months ago
Crawling the web for structured documents
Structured Information Retrieval is gaining a lot of interest in recent years, as this kind of information is becoming an invaluable asset for professional communities such as Sof...
Julián Urbano, Juan Loréns, Yorgos A...
ICDM
2008
IEEE
186views Data Mining» more  ICDM 2008»
13 years 11 months ago
xCrawl: A High-Recall Crawling Method for Web Mining
Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
WWW
2007
ACM
14 years 5 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2007
ACM
14 years 5 months ago
Random web crawls
This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, w...
Toufik Bennouas, Fabien de Montgolfier
ADBIS
2003
Springer
173views Database» more  ADBIS 2003»
13 years 9 months ago
UCYMICRA: Distributed Indexing of the Web Using Migrating Crawlers
Due to the tremendous increase rate and the high change frequency of Web documents, maintaining an up-to-date index for searching purposes (search engines) is becoming a challenge....
Odysseas Papapetrou, Stavros Papastavrou, George S...