Compact Similarity Joins

15 years 10 months ago

Download www.autonlab.org

— Similarity joins have attracted signiﬁcant interest, with applications in Geographical Information Systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly ﬁne-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density. Under such circumstances, the response time and the search effort are both almost quadratic in the database size, which is often prohibitive. We solve this problem by providing two algorithms that ﬁnd a compact representation of the similarity join result, while retaining all the information in the standard join. Our algorithms have the following characteristics: (a) they are at least as fast as the standard similarity join algorithm, and typically much faster, (b) they generate signiﬁcantly smaller output, (c) they provably lose no information, (d) they scale well to large data sets, and (e) they can be applied to any of the standard tree data struc...

Brent Bryan, Frederick Eberhardt, Christos Falouts

Real-time Traffic