Sciweavers

CPM
2000
Springer

Indexing Text with Approximate q-Grams

13 years 9 months ago
Indexing Text with Approximate q-Grams
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the index points out the text areas that could contain occurrences and must be verified. The index parameters permit load balancing between filtering and verification work, and provide a compromise between the space requirement of the index and the error level for which the filtration is still efficient. We show experimentally that the index is competitive against others that take more space, being in fact the fastest choice for intermediate error levels, an area where no current index is useful. Key words: Approximate string matching, text databases, q-gram indices.
Gonzalo Navarro, Erkki Sutinen, Jani Tanninen, Jor
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where CPM
Authors Gonzalo Navarro, Erkki Sutinen, Jani Tanninen, Jorma Tarhio
Comments (0)