Sciweavers

IDA
2011
Springer

A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

12 years 11 months ago
A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets
The amount of data produced by ubiquitous computing applications is quickly growing, due to the pervasive presence of small devices endowed with sensing, computing and communication capabilities. Heterogeneity and strong interdependence, which characterize ‘ubiquitous data’, require a (multi-)relational approach to their analysis. However, relational data mining algorithms do not scale well and very large data sets are hardly processable. In this paper we propose an extension of a relational algorithm for multi-level frequent pattern discovery, which resorts to data sampling and distributed computation in Grid environments, in order to overcome the computational limits of the original serial algorithm. The set of patterns discovered by the new algorithm approximates the set of exact solutions found by the serial algorithm. The quality of approximation depends on three parameters: the proportion of data in each sample, the minimum support thresholds and the number of samples in whic...
Annalisa Appice, Michelangelo Ceci, Antonio Turi,
Added 14 May 2011
Updated 14 May 2011
Type Journal
Year 2011
Where IDA
Authors Annalisa Appice, Michelangelo Ceci, Antonio Turi, Donato Malerba
Comments (0)