Sciweavers

97
Voted
DKE
2006

Online clustering of parallel data streams

14 years 11 months ago
Online clustering of parallel data streams
In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method's efficiency is mainly due to a scalable online transformation of the original data which allows for a fast com...
Jürgen Beringer, Eyke Hüllermeier
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where DKE
Authors Jürgen Beringer, Eyke Hüllermeier
Comments (0)