Free Online Productivity Tools
i2Speak
i2Symbol
i2OCR
iTex2Img
iWeb2Print
iWeb2Shot
i2Type
iPdf2Split
iPdf2Merge
i2Bopomofo
i2Arabic
i2Style
i2Image
i2PDF
iLatex2Rtf
Sci2ools

KDD

2006

ACM

2006

ACM

Let R be a set of objects. An object o R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time. The objective is to return all outliers with the smallest I/O cost. This paper considers a generic version of the problem, where no information is available for outlier computation, except for objects' mutual distances. We prove an upper bound for the memory consumption which permits the discovery of all outliers by scanning the dataset 3 times. The upper bound turns out to be extremely low in practice, e.g., less than 1% of R. Since the actual memory capacity of a realistic DBMS is typically larger, we develop a novel algorithm, which integrates our theoretical findings with carefullydesigned heuristics that leverage the additional memory to improve I/O efficiency. Our technique reports all outliers by scanning the dataset at most twice (in some cases, even once), a...

Related Content

Added |
30 Nov 2009 |

Updated |
30 Nov 2009 |

Type |
Conference |

Year |
2006 |

Where |
KDD |

Authors |
Yufei Tao, Xiaokui Xiao, Shuigeng Zhou |

Comments (0)