Sciweavers

KDD
2006
ACM

Suppressing model overfitting in mining concept-drifting data streams

14 years 4 months ago
Suppressing model overfitting in mining concept-drifting data streams
Mining data streams of changing class distributions is important for real-time business decision support. The stream classifier must evolve to reflect the current class distribution. This poses a serious challenge. On the one hand, relying on historical data may increase the chances of learning obsolete models. On the other hand, learning only from the latest data may lead to biased classifiers, as the latest data is often an unrepresentative sample of the current class distribution. The problem is particularly acute in classifying rare events, when, for example, instances of the rare class do not even show up in the most recent training data. In this paper, we use a stochastic model to describe the concept shifting patterns and formulate this problem as an optimization one: from the historical and the current training data that we have observed, find the most-likely current distribution, and learn a classifier based on the most-likely distribution. We derive an analytic solution and ...
Haixun Wang, Jian Yin, Jian Pei, Philip S. Yu, Jef
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2006
Where KDD
Authors Haixun Wang, Jian Yin, Jian Pei, Philip S. Yu, Jeffrey Xu Yu
Comments (0)