Sciweavers

SGAI
2009
Springer

Parallel Rule Induction with Information Theoretic Pre-Pruning

13 years 9 months ago
Parallel Rule Induction with Information Theoretic Pre-Pruning
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach. In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Frederic T. Stahl, Max Bramer, Mo Adda
Added 27 May 2010
Updated 27 May 2010
Type Conference
Year 2009
Where SGAI
Authors Frederic T. Stahl, Max Bramer, Mo Adda
Comments (0)