Parallel Incremental 2D-Discretization on Dynamic Datasets

11 years 4 months ago
Parallel Incremental 2D-Discretization on Dynamic Datasets
Most current work in data mining assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. Such approaches can waste a lot of computational and I/O resources, and result in relatively slow response times, to essentially an interactive process. In this paper we address this issue in the context of 2-dimensional discretization within a multi-attribute database. Discretization, an important problem in data mining, is typically used to partition the range of continuous attribute(s) into intervals which highlight the behavior of a related discrete attribute. It can be used to build decision trees and to determine appropriate aggregations for On-Line Analytical Processing. We first propose a time-optimal solution to the problem. We then parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the alg...
Srinivasan Parthasarathy, Arun Ramakrishnan
Added 15 Jul 2010
Updated 15 Jul 2010
Type Conference
Year 2002
Where IPPS
Authors Srinivasan Parthasarathy, Arun Ramakrishnan
Comments (0)