Sciweavers

3090 search results - page 590 / 618
» Document Processing with LinkIT
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
15 years 10 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
95
Voted
KDD
2008
ACM
119views Data Mining» more  KDD 2008»
15 years 10 months ago
SAIL: summation-based incremental learning for information-theoretic clustering
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...
Junjie Wu, Hui Xiong, Jian Chen
KDD
2006
ACM
109views Data Mining» more  KDD 2006»
15 years 10 months ago
Extracting redundancy-aware top-k patterns
Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance...
Dong Xin, Hong Cheng, Xifeng Yan, Jiawei Han
KDD
2004
ACM
210views Data Mining» more  KDD 2004»
15 years 10 months ago
Probabilistic author-topic models for information discovery
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
KDD
2002
ACM
112views Data Mining» more  KDD 2002»
15 years 10 months ago
From run-time behavior to usage scenarios: an interaction-pattern mining approach
A key challenge facing IT organizations today is their evolution towards adopting e-business practices that gives rise to the need for reengineering their underlying software syst...
Mohammad El-Ramly, Eleni Stroulia, Paul G. Sorenso...