Sciweavers

260 search results - page 20 / 52
» Industry-scale duplicate detection
Sort
View
ICIP
2006
IEEE
15 years 11 months ago
Topic Tracking Across Broadcast News Videos with Visual Duplicates and Semantic Concepts
Videos from distributed sources (e.g., broadcasts, podcasts, blogs, etc.) have grown exponentially. Topic threading is very useful for organizing such large-volume information sou...
Winston H. Hsu, Shih-Fu Chang
BIOINFORMATICS
2002
146views more  BIOINFORMATICS 2002»
14 years 9 months ago
A duplication growth model of gene expression networks
Motivation: There has been considerable interest in developing computational techniques for inferring genetic regulatory networks from whole-genome expression profiles. When expre...
Ashish Bhan, David J. Galas, T. Gregory Dewey
BMCBI
2007
109views more  BMCBI 2007»
14 years 9 months ago
Discarding duplicate ditags in LongSAGE analysis may introduce significant error
Background: During gene expression analysis by Serial Analysis of Gene Expression (SAGE), duplicate ditags are routinely removed from the data analysis, because they are suspected...
Jeppe Emmersen, Anna M. Heidenblut, Annabeth Laurs...
FIMI
2004
279views Data Mining» more  FIMI 2004»
14 years 11 months ago
DCI Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets
One of the main problems raising up in the frequent closed itemsets mining problem is the duplicate detection. In this paper we propose a general technique for promptly detecting ...
Claudio Lucchese, Salvatore Orlando, Raffaele Pere...
78
Voted
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
15 years 10 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...