Sciweavers

MLDM
2009
Springer

PMCRI: A Parallel Modular Classification Rule Induction Framework

13 years 10 months ago
PMCRI: A Parallel Modular Classification Rule Induction Framework
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Frederic T. Stahl, Max A. Bramer, Mo Adda
Added 27 May 2010
Updated 27 May 2010
Type Conference
Year 2009
Where MLDM
Authors Frederic T. Stahl, Max A. Bramer, Mo Adda
Comments (0)