Sciweavers

694 search results - page 81 / 139
» Web page ranking using link attributes
Sort
View
WWW
2006
ACM
15 years 6 months ago
Do not crawl in the DUST: different URLs with similar text
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar
SIGMOD
2012
ACM
230views Database» more  SIGMOD 2012»
13 years 3 months ago
Pay-as-you-go data integration for linked data: opportunities, challenges and architectures
Linked Data (LD) provides principles for publishing data that underpin the development of an emerging web of data. LD follows the web in providing low barriers to entry: publisher...
Norman W. Paton, Klitos Christodoulou, Alvaro A. A...
112
Voted
LREC
2008
159views Education» more  LREC 2008»
15 years 2 months ago
Corpus Exploitation from Wikipedia for Ontology Construction
Ontology construction usually requires a domain-specific corpus for building corresponding concept hierarchy. The domain corpus must have a good coverage of domain knowledge. Wiki...
Gaoying Cui, Qin Lu, Wenjie Li, Yi-Rong Chen
WWW
2008
ACM
16 years 1 months ago
IRLbot: scaling to 6 billion pages and beyond
This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with ...
Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, Dmit...
102
Voted
SIGIR
2008
ACM
15 years 7 days ago
Pagerank based clustering of hypertext document collections
Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hype...
Konstantin Avrachenkov, Vladimir Dobrynin, Danil N...