Parallel Algorithms for Distance-Based and Density-Based Outliers

13 years 8 months ago
Parallel Algorithms for Distance-Based and Density-Based Outliers
An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the dataset. In this paper we design two parallel algorithms, the first one is for finding out distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting densitybased local outliers. In both cases data parallelism is used. We show that both algorithms reach near linear speedup. Our algorithms are tested on four real-world datasets coming from the Machine Learning Database Repository at the UCI.
Elio Lozano, Edgar Acuña
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICDM
Authors Elio Lozano, Edgar Acuña
Comments (0)