Sciweavers

31 search results - page 4 / 7
» Detecting near-duplicates for web crawling
Sort
View
73
Voted
WWW
2008
ACM
16 years 1 months ago
Incremental web page template detection
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
102
Voted
DIMVA
2010
15 years 1 months ago
Why Johnny Can't Pentest: An Analysis of Black-Box Web Vulnerability Scanners
Black-box web vulnerability scanners are a class of tools that can be used to identify security issues in web applications. These tools are often marketed as "point-and-click ...
Adam Doupé, Marco Cova, Giovanni Vigna
94
Voted
IADIS
2003
15 years 1 months ago
SPLAT: A System for Self-Plagiarism Detection
This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...
Christian S. Collberg, Stephen G. Kobourov, Joshua...
115
Voted
ICAIL
2007
ACM
15 years 4 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
DEXA
2010
Springer
226views Database» more  DEXA 2010»
14 years 11 months ago
Vi-DIFF: Understanding Web Pages Changes
Nowadays, many applications are interested in detecting and discovering changes on the web to help users to understand page updates and more generally, the web dynamics. Web archiv...
Zeynep Pehlivan, Myriam Ben Saad, Stéphane ...