Sciweavers

AI
2004
Springer

Distributed Data Mining vs. Sampling Techniques: A Comparison

13 years 10 months ago
Distributed Data Mining vs. Sampling Techniques: A Comparison
To address the of mining a huge volume of geographically distributed databases, we propose two approaches. The first one is to download only a sample of each database. The second option is to mine each distributed database remotely and to download the resulting models to a central site and then aggregate these models. In this paper, we present an overview of the most common sampling techniques. We then present a new technique of distributed data-mining based on rule set models, where the aggregation technique is based on a confidence coefficient associated with each rule and on very small samples from each database. Finally, we present a comparison between the best sampling techniques that we found in the literature, and our approach of model aggregation.
Mohamed Aounallah, Sébastien Quirion, Guy W
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where AI
Authors Mohamed Aounallah, Sébastien Quirion, Guy W. Mineau
Comments (0)