Sciweavers

31 search results - page 3 / 7
» Detecting near-duplicates for web crawling
Sort
View
DEXA
2006
Springer
197views Database» more  DEXA 2006»
13 years 7 months ago
Cleaning Web Pages for Effective Web Content Mining
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Jing Li, Christie I. Ezeife
ICDE
2009
IEEE
194views Database» more  ICDE 2009»
14 years 7 months ago
Top-k Set Similarity Joins
Abstract-- Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Tradi...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Sh...
JCB
2007
106views more  JCB 2007»
13 years 5 months ago
Clustered Sequence Representation for Fast Homology Search
We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
SIGIR
2008
ACM
13 years 5 months ago
Exploring traversal strategy for web forum crawling
In this paper, we study the problem of Web forum crawling. Web forum has now become an important data source of many Web applications; while forum crawling is still a challenging ...
Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei ...
CCS
2011
ACM
12 years 5 months ago
Automated black-box detection of side-channel vulnerabilities in web applications
Web applications divide their state between the client and the server. The frequent and highly dynamic client-server communication that is characteristic of modern web application...
Peter Chapman, David Evans