Sciweavers

31 search results - page 4 / 7
» Detecting near-duplicates for web crawling
Sort
View
WWW
2008
ACM
14 years 7 months ago
Incremental web page template detection
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
DIMVA
2010
13 years 7 months ago
Why Johnny Can't Pentest: An Analysis of Black-Box Web Vulnerability Scanners
Black-box web vulnerability scanners are a class of tools that can be used to identify security issues in web applications. These tools are often marketed as "point-and-click ...
Adam Doupé, Marco Cova, Giovanni Vigna
IADIS
2003
13 years 7 months ago
SPLAT: A System for Self-Plagiarism Detection
This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...
Christian S. Collberg, Stephen G. Kobourov, Joshua...
ICAIL
2007
ACM
13 years 10 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
DEXA
2010
Springer
226views Database» more  DEXA 2010»
13 years 4 months ago
Vi-DIFF: Understanding Web Pages Changes
Nowadays, many applications are interested in detecting and discovering changes on the web to help users to understand page updates and more generally, the web dynamics. Web archiv...
Zeynep Pehlivan, Myriam Ben Saad, Stéphane ...