Accuracy Estimation With Clustered Dataset

13 years 8 months ago

Download crpit.com

If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead to biased and misleading results. An adapted cross-validation is described for this case. Using a simulation, the sampling distribution of the generalization error rate estimate, under cluster or simple random sampling hypothesis, are compared to the true value. The results highlight the impact of the sampling design on inference: clearly, clustering has a significant impact; the repartition between learning set and test set should result from a random partition of the clusters, and not from a random partition of the examples. With cluster sampling, standard cross-validation underestimates the generalization error rate, and is deficient for model selection. These results are illustrated with a real application of automatic identification of spoken language.

Ricco Rakotomalala, Jean-Hugues Chauchat, Fran&cce

Real-time Traffic

AUSDM 2006 | Data Mining | Error Rate | Error Rate Estimate | Generalization Error Rate |

claim paper

» A highly efficient multicore algorithm for clustering extremely large datasets

» Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

» Hierarchical modelbased clustering of large datasets through fractionation and refractiona...

» Dynamic ClusteringBased Estimation of Missing Values in Mixed Type Data

» A comparative evaluation on the accuracies of software effort estimates from clustered dat...

» Estimating Generalization Error on TwoClass Datasets Using OutofBag Estimates

» Finding Clusters in subspaces of very large multidimensional datasets

» HICCUP Hierarchical Clustering Based Value Imputation using Heterogeneous Gene Expression ...

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	AUSDM
Authors	Ricco Rakotomalala, Jean-Hugues Chauchat, François Pellegrino

Comments (0)

Sciweavers

Accuracy Estimation With Clustered Dataset

AUSDM 2006 | Data Mining | Error Rate | Error Rate Estimate | Generalization Error Rate |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers