On Bit-Parallel Processing of Multi-byte Text

15 years 6 months ago

Download www.shino.ecei.tohoku.ac.jp

There exist practical bit-parallel algorithms for several types of pair-wise string processing, such as longest common subsequence computation or approximate string matching. The bit-parallel algorithms typically use a size-σ table of match bit-vectors, where the bits in the vector for a character λ identify the positions where the character λ occurs in one of the processed strings, and σ is the alphabet size. The time or space cost of computing the match table is not prohibitive with reasonably small alphabets such as ASCII text. However, for example in the case of general Unicode text the possible numerical code range of the characters is roughly one million. This makes using a simple table impractical. In this paper we evaluate three diﬀerent schemes for overcoming this problem. First we propose to replace the character code table by a character code automaton. Then we compare this method with two other schemes: using a hash table, and the binary-search based solution proposed...

Heikki Hyyrö, Jun Takaba, Ayumi Shinohara, Ma

Real-time Traffic

AIRS 2004 | Bit-parallel Algorithms | Character Code Table | Hash Table | Information Retrieval |

claim paper

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	AIRS
Authors	Heikki Hyyrö, Jun Takaba, Ayumi Shinohara, Masayuki Takeda

Sciweavers

On Bit-Parallel Processing of Multi-byte Text

AIRS 2004 | Bit-parallel Algorithms | Character Code Table | Hash Table | Information Retrieval |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers