Sciweavers

ACL
2009

A Succinct N-gram Language Model

13 years 2 months ago
A Succinct N-gram Language Model
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of Ngram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N-gram language model structure. We also use `variable length coding' and `block-wise compression' to compress values associated with nodes. Experimental results for three large-scale N-gram compression tasks achieved a significant compression rate without any loss.
Taro Watanabe, Hajime Tsukada, Hideki Isozaki
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Taro Watanabe, Hajime Tsukada, Hideki Isozaki
Comments (0)