Internal Clustering Evaluation of Data Streams

5 years 1 days ago
Internal Clustering Evaluation of Data Streams
Abstract. Clustering validation is a crucial part of choosing a clustering algorithm which performs best for an input data. Internal clustering validation is efficient and realistic, whereas external validation requires a ground truth which is not provided in most applications. In this paper, we analyze the properties and performances of eleven internal clustering measures. In particular, as the importance of streaming data grows, we apply these measures to carefully synthesized stream scenarios to reveal how they react to clusterings on evolving data streams. A series of experimental results show that different from the case with static data, the Calinski-Harabasz index performs the best in coping with common aspects and errors of stream clustering. 1 Motivation Clustering validation is necessary for most applications and is regarded as much important as the clustering itself [20]. There are two types of clustering validation [19]. The external validation, which compares the clusteri...
Marwan Hassani, Thomas Seidl 0001
Added 16 Apr 2016
Updated 16 Apr 2016
Type Journal
Year 2015
Authors Marwan Hassani, Thomas Seidl 0001
Comments (0)