Sciweavers

IR
2008

A compressed self-index using a Ziv-Lempel dictionary

13 years 4 months ago
A compressed self-index using a Ziv-Lempel dictionary
A compressed full-text self-index for a text T , of size u, is a data structure used to search for patterns P, of size m, in T , that requires reduced space, i.e. space that depends on the empirical entropy (Hk or H0) of T , and is, furthermore, able to reproduce any substring of T . In this paper we present a new compressed self-index able to locate the occurrences of P in O((m + occ) log u) time, where occ is the number of occurrences. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m2 ) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the T78 suffix tree. We show that our method is very competitive in practice by comparing it against other state of the art compressed indexes.
Luís M. S. Russo, Arlindo L. Oliveira
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where IR
Authors Luís M. S. Russo, Arlindo L. Oliveira
Comments (0)