Efficiently handling feature redundancy in high-dimensional data

10 years 6 months ago
Efficiently handling feature redundancy in high-dimensional data
High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in preprocessing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications-data mining; I.2.6 [Artificial Intelligence]: Learning; I.5.2 [Pattern Recognition]: Design Methodology --feature evaluation and selection Keywords Feature selection, redundancy, high-dimensional data
Lei Yu, Huan Liu
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors Lei Yu, Huan Liu
Comments (0)