Supervised Evaluation of Dataset Partitions: Advantages and Practice

10 years 5 months ago
Supervised Evaluation of Dataset Partitions: Advantages and Practice
In the context of large databases, data preparation takes a greater importance : instances and explanatory attributes have to be carefully selected. In supervised learning, instances partitioning techniques have been developped for univariate representations, leading to precise and comprehensible evaluations of the amount of information contained in an attribute, with respect to the target attribute. Still, the multivariate case remains unstated. In this paper, we describe the partitioning intrinsic convenience for data preparation and we settle a framework for supervised partitioning. A new evaluation criterion of labelled objects partitions, which is based on Minimum Description Length principle, is then set and tested on real and synthetic data sets. 1 Supervised partitioning problems in data preparation In a data mining project, the data preparation phase is a key one. Its main goal is to provide a clean and representative database for the consecutive modelling phase [3]. Typically...
Sylvain Ferrandiz, Marc Boullé
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where MLDM
Authors Sylvain Ferrandiz, Marc Boullé
Comments (0)