Abstract-- Answering approximate queries on string collections is important in applications such as data cleaning, query relaxation, and spell checking, where inconsistencies and e...
In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both s...
Inverted indexes are the most fundamental and widely used data structures in information retrieval. For each unique word occurring in a document collection, the inverted index sto...
Manish Patil, Sharma V. Thankachan, Rahul Shah, Wi...
Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a cluster...
Record label companies would like to identify potential artists as early as possible in their careers, before other companies approach the artists with competing contracts. The va...