Parallel Mining of Outliers in Large Database

9 years 6 months ago
Parallel Mining of Outliers in Large Database
Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database, previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. Th...
Edward Hung, David Wai-Lok Cheung
Added 18 Dec 2010
Updated 18 Dec 2010
Type Journal
Year 2002
Where DPD
Authors Edward Hung, David Wai-Lok Cheung
Comments (0)