Sciweavers

4 search results - page 1 / 1
» On the Evolution of Clusters of Near-Duplicate Web Pages
Sort
View
LAWEB
2003
IEEE
13 years 9 months ago
On the Evolution of Clusters of Near-Duplicate Web Pages
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Dennis Fetterly, Mark Manasse, Marc Najork
WWW
2008
ACM
14 years 5 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
IWPSE
2005
IEEE
13 years 10 months ago
Supporting Web Application Evolution by Dynamic Analysis
The evolution of Web Applications needs to be supported by the availability of proper analysis and design documents. UML use case diagrams are certainly useful to identify feature...
Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna ...
KDD
2006
ACM
198views Data Mining» more  KDD 2006»
14 years 4 months ago
Event detection from evolution of click-through data
Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose t...
Qiankun Zhao, Tie-Yan Liu, Sourav S. Bhowmick, Wei...