Sciweavers

CIKM
2004
Springer

Indexing text data under space constraints

13 years 10 months ago
Indexing text data under space constraints
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (or q-grams) has been explored earlier without sufficient consideration of efficiency. q-grams are used to prune away rows that do not qualify for the query. The problem is to identify a finite number of grams subject to storage constraint that gives maximal pruning for a given query workload. Our contributions include: i) a formal problem definition, proof that the problem is NP-hard and adaptation of a previously studied approximate algorithm that produces results within a provable error bound, ii) performance evaluation of the application of the novel method to real data, and iii) parallelization of the algorithm, scaling considerations and a proposal to handle scaling issues. Categories and Subject Descriptors: H.2.4[Systems]: Relational databases, Textual databases; H.3.1[Content Analysis and Indexing]:...
Bijit Hore, Hakan Hacigümüs, Balakrishna
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CIKM
Authors Bijit Hore, Hakan Hacigümüs, Balakrishna R. Iyer, Sharad Mehrotra
Comments (0)