Online classification of nonstationary data streams

13 years 2 months ago
Online classification of nonstationary data streams
Most classification methods are based on the assumption that the data conforms to a stationary distribution. However, the real-world data is usually collected over certain periods of time, ranging from seconds to years, and ignoring possible changes in the underlying concept, also known as concept drift, may degrade the predictive performance of a classification model. Moreover, the computation time, the amount of required memory, and the model complexity may grow indefinitely with the continuous arrival of new training instances. This paper describes and evaluates OLIN, an online classification system, which dynamically adjusts the size of the training window and the number of new examples between model re-constructions to the current rate of concept drift. By using a fixed amount of computer resources, OLIN produces models, which have nearly the same accuracy as the ones that would be produced by periodically re-constructing the model from all accumulated instances. We evaluate the ...
Mark Last
Added 19 Dec 2010
Updated 19 Dec 2010
Type Journal
Year 2002
Where IDA
Authors Mark Last
Comments (0)