Sciweavers

709 search results - page 107 / 142
» Constraint-Based Pattern Set Mining
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
16 years 26 days ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
115
Voted
KDD
2008
ACM
120views Data Mining» more  KDD 2008»
16 years 26 days ago
Entity categorization over large document collections
Extracting entities (such as people, movies) from documents and identifying the categories (such as painter, writer) they belong to enable structured querying and data analysis ov...
Arnd Christian König, Rares Vernica, Venkates...
121
Voted
KDD
2004
ACM
302views Data Mining» more  KDD 2004»
16 years 26 days ago
Redundancy based feature selection for microarray data
In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseas...
Lei Yu, Huan Liu
86
Voted
KDD
2002
ACM
109views Data Mining» more  KDD 2002»
16 years 26 days ago
Topics in 0--1 data
Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which in...
Ella Bingham, Heikki Mannila, Jouni K. Seppän...
KDD
2002
ACM
119views Data Mining» more  KDD 2002»
16 years 26 days ago
On effective classification of strings with wavelets
In recent years, the technological advances in mapping genes have made it increasingly easy to store and use a wide variety of biological data. Such data are usually in the form o...
Charu C. Aggarwal