Sciweavers

CPM
2007
Springer

Efficient Computation of Substring Equivalence Classes with Suffix Arrays

13 years 8 months ago
Efficient Computation of Substring Equivalence Classes with Suffix Arrays
This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.
Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai,
Added 14 Aug 2010
Updated 14 Aug 2010
Type Conference
Year 2007
Where CPM
Authors Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
Comments (0)