Sciweavers

ICDM
2007
IEEE

On Appropriate Assumptions to Mine Data Streams: Analysis and Practice

13 years 10 months ago
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
Recent years have witnessed an increasing number of studies in stream mining, which aim at building an accurate model for continuously arriving data. Somehow most existing work makes the implicit assumption that the training data and the yet-to-come testing data are always sampled from the “same distribution”, and yet this “same distribution” evolves over time. We demonstrate that this may not be true, and one actually may never know either “how” or “when” the distribution changes. Thus, a model that fits well on the observed distribution can have unsatisfactory accuracy on the incoming data. Practically, one can just assume the bare minimum that learning from observed data is better than both random guessing and always predicting exactly the same class label. Importantly, we formally and experimentally demonstrate the robustness of a model averaging and simple voting-based framework for data streams, particularly when incoming data “continuously follows significan...
Jing Gao, Wei Fan, Jiawei Han
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where ICDM
Authors Jing Gao, Wei Fan, Jiawei Han
Comments (0)