Sciweavers

ICGI
2010
Springer

Enhanced Suffix Arrays as Language Models: Virtual k-Testable Languages

13 years 5 months ago
Enhanced Suffix Arrays as Language Models: Virtual k-Testable Languages
Abstract. In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous backoff automatically identifies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks.
Herman Stehouwer, Menno van Zaanen
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICGI
Authors Herman Stehouwer, Menno van Zaanen
Comments (0)