On the Dimensions of Data Complexity through Synthetic Data Sets

15 years 2 months ago

Download www.salle.url.edu

Abstract. This paper deals with the characterization of data complexity and the relationship with the classification accuracy. We study three dimensions of data complexity: the length of the class boundary, the number of features, and the number of instances of the data set. We find that the length of the class boundary is the most relevant dimension of complexity, since it can be used as an estimate of the maximum achievable accuracy rate of a classifier. The number of attributes and the number of instances do not affect classifier accuracy by themselves, if the boundary length is kept constant. The study emphasizes the use of measures revealing the intrinsic structure of data and recommends their use to extract conclusions on classifier behavior and their relative performance in multiple comparison experiments. Keywords. Data complexity, Classification, Dimensionality, Synthetic data sets

Núria Macià, Ester Bernadó-Ma

Real-time Traffic

Artificial Intelligence | CCIA 2008 | Class Boundary | Data Complexity | Data Sets |

claim paper

» Preliminary approach on synthetic data sets generation based on class separability measure

» Approximate Inverse Frequent Itemset Mining Privacy Complexity and Approximation

» Effective Level Set Image Segmentation With a Kernel Induced Data Term

» Distance Approximating Dimension Reduction of Riemannian Manifolds

» Finding lowentropy sets and trees from binary data

» A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

» DiskBased Sampling for Outlier Detection in High Dimensional Data

» SynTReN a generator of synthetic gene expression data for design and analysis of structure...

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	CCIA
Authors	Núria Macià, Ester Bernadó-Mansilla, Albert Orriols-Puig

Comments (0)

Sciweavers

On the Dimensions of Data Complexity through Synthetic Data Sets

Artificial Intelligence | CCIA 2008 | Class Boundary | Data Complexity | Data Sets |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers