Many content-oriented applications require a scalable text index. Building such an index is challenging. In addition to the logic of inserting and searching documents, developers ...
We propose a scalable technique called Seeded Clustering that allows us to maintain R-tree indices by bulk insertion while keeping pace with high data arrival rates. Our approach ...
We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the ...
Distributed-system observation tools require an efficient data structure to store and query the partial-order of execution. Such data structures typically use vector timestamps to...
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as k-means. However, spectral clustering suffers fro...