Sciweavers

1109 search results - page 43 / 222
» Crawling on web graphs
Sort
View
SIGMOD
2000
ACM
85views Database» more  SIGMOD 2000»
15 years 6 months ago
Finding Replicated Web Collections
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Junghoo Cho, Narayanan Shivakumar, Hector Garcia-M...
ML
2010
ACM
142views Machine Learning» more  ML 2010»
15 years 10 days ago
Graph regularization methods for Web spam detection
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as wel...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo
WISE
2005
Springer
15 years 7 months ago
Temporal Ranking of Search Engine Results
Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevan...
Adam Jatowt, Yukiko Kawai, Katsumi Tanaka
WEBDB
2005
Springer
124views Database» more  WEBDB 2005»
15 years 7 months ago
JXP: Global Authority Scores in a P2P Network
This document presents the JXP algorithm for dynamically and collaboratively computing PageRank-style authority scores of Web pages distributed in a P2P network. In the architectu...
Josiane Xavier Parreira, Gerhard Weikum
DM
2008
81views more  DM 2008»
15 years 2 months ago
The diameter of protean graphs
Abstract. The web graph is a real-world self-organizing network whose vertices correspond to web pages, and whose edges correspond to links between pages. Many stochastic models fo...
Pawel Pralat