Sciweavers

96 search results - page 1 / 20
» Detecting Near-replicas on the Web by Content and Hyperlink ...
Sort
View
WWW
2003
ACM
14 years 5 months ago
Detecting Near-replicas on the Web by Content and Hyperlink Analysis
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...
AIRWEB
2008
Springer
13 years 6 months ago
Web spam identification through content and hyperlinks
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as we...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo
SIGKDD
2008
248views more  SIGKDD 2008»
13 years 4 months ago
Web data mining: exploring hyperlinks, contents, and usage data
This paper presents a review of the book "Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data" by Bing Liu. The review concludes that the breadth and depth ...
Olfa Nasraoui
HT
2003
ACM
13 years 10 months ago
Enhanced web document summarization using hyperlinks
This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on ...
Jean-Yves Delort, Bernadette Bouchon-Meunier, Mari...
SIGIR
1998
ACM
13 years 9 months ago
Improved Algorithms for Topic Distillation in a Hyperlinked Environment
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity...
Krishna Bharat, Monika Rauch Henzinger