LexEQUAL: Supporting Multiscript Matching in Database Systems

10 years 9 months ago
LexEQUAL: Supporting Multiscript Matching in Database Systems
To effectively support today's global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the match performance in the multiscript space. We evaluate the performance of the LexEQUAL operator on a real multiscript names dataset and demonstrate that it is possible to simultaneously achieve good rec...
A. Kumaran, Jayant R. Haritsa
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2004
Where EDBT
Authors A. Kumaran, Jayant R. Haritsa
Comments (0)