Sciweavers

KDD
2008
ACM
120views Data Mining» more  KDD 2008»
14 years 5 months ago
Multi-class cost-sensitive boosting with p-norm loss functions
We propose a family of novel cost-sensitive boosting methods for multi-class classification by applying the theory of gradient boosting to p-norm based cost functionals. We establ...
Aurelie C. Lozano, Naoki Abe
KDD
2008
ACM
132views Data Mining» more  KDD 2008»
14 years 5 months ago
Partitioned logistic regression for spam filtering
Naive Bayes and logistic regression perform well in different regimes. While the former is a very simple generative model which is efficient to train and performs well empirically...
Ming-wei Chang, Wen-tau Yih, Christopher Meek
KDD
2008
ACM
159views Data Mining» more  KDD 2008»
14 years 5 months ago
Semi-supervised learning with data calibration for long-term time series forecasting
Many time series prediction methods have focused on single step or short term prediction problems due to the inherent difficulty in controlling the propagation of errors from one ...
Haibin Cheng, Pang-Ning Tan
KDD
2008
ACM
138views Data Mining» more  KDD 2008»
14 years 5 months ago
Quantitative evaluation of approximate frequent pattern mining algorithms
Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life d...
Rohit Gupta, Gang Fang, Blayne Field, Michael Stei...
KDD
2008
ACM
217views Data Mining» more  KDD 2008»
14 years 5 months ago
Stream prediction using a generative model based on frequent episodes in event sequences
This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wis...
Srivatsan Laxman, Vikram Tankasali, Ryen W. White
KDD
2008
ACM
119views Data Mining» more  KDD 2008»
14 years 5 months ago
SAIL: summation-based incremental learning for information-theoretic clustering
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...
Junjie Wu, Hui Xiong, Jian Chen
KDD
2008
ACM
164views Data Mining» more  KDD 2008»
14 years 5 months ago
Microscopic evolution of social networks
We present a detailed study of network evolution by analyzing four large online social networks with full temporal information about node and edge arrivals. For the first time at ...
Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew ...
KDD
2008
ACM
104views Data Mining» more  KDD 2008»
14 years 5 months ago
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we p...
Yang Xiang, Ruoming Jin, David Fuhry, Feodor F. Dr...
KDD
2008
ACM
174views Data Mining» more  KDD 2008»
14 years 5 months ago
Automatic identification of quasi-experimental designs for discovering causal knowledge
Researchers in the social and behavioral sciences routinely rely on quasi-experimental designs to discover knowledge from large databases. Quasi-experimental designs (QEDs) exploi...
David D. Jensen, Andrew S. Fast, Brian J. Taylor, ...