Sciweavers

JMLR
2002

The Learning-Curve Sampling Method Applied to Model-Based Clustering

13 years 4 months ago
The Learning-Curve Sampling Method Applied to Model-Based Clustering
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computational cost of learning a model increases as a function of the sample size of the training data, whereas the accuracy of a model has diminishing improvements as a function of sample size. Thus, the learning-curve sampling method monitors the increasing costs and performance as larger and larger amounts of data are used for training, and terminates learning when future costs outweigh future benefits. In this paper, we formalize the learning-curve sampling method and its associated cost-benefit tradeoff in terms of decision theory. In addition, we describe the application of the learning-curve sampling method to the task of model-based clustering via the expectation-maximization (EM) algorithm. In experiments on three real data sets, we show that the learning-curve sampling method produces models that are nearly a...
Christopher Meek, Bo Thiesson, David Heckerman
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where JMLR
Authors Christopher Meek, Bo Thiesson, David Heckerman
Comments (0)