Affordable, fast computers with large memories have lessened the demand for program efficiency, but applications such as browsing and searching very large databases often have rat...
We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R2) or the skip octree (for point data in Rd, with constant d > 2). Our da...
David Eppstein, Michael T. Goodrich, Jonathan Z. S...
The success of popular algorithms such as k-means clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domain-specific n...
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...
Similarity search methods are widely used as kernels in various data mining and machine learning applications including those in computational biology, web search/clustering. Near...