Sciweavers

VLDB
2007
ACM

Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures

14 years 4 months ago
Fast nGram-Based String Search Over Data Encoded Using Algebraic Signatures
We propose a novel string search algorithm for data stored once and read many times. Our search method combines the sublinear traversal of the record (as in Boyer Moore or Knuth-Morris-Pratt) with the agglomeration of parts of the record and search pattern into a single character ? the algebraic signature ? in the manner of Karp-Rabin. Our experiments show that our algorithm is up to seventy times faster for DNA data, up to eleven times faster for ASCII, and up to a six times faster for XML documents compared with an implementation of Boyer-Moore. To obtain this speed-up, we store records in encoded form, where each original character is replaced with an algebraic signature. Our method applies to records stored in databases in general and to distributed implementations of a Database As Service (DAS) in particular. Clients send records for insertion and search patterns already in encoded form and servers never operate on records in clear text. No one at a node can involuntarily discove...
Witold Litwin, Riad Mokadem, Philippe Rigaux, Thom
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Witold Litwin, Riad Mokadem, Philippe Rigaux, Thomas J. E. Schwarz
Comments (0)