Improving PPM Algorithm Using Dictionaries

9 years 2 months ago
—We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over characterbased PPM, especially in low order cases. Keywords-Text compression; Markov model; PPM; Dictionary model.
Yichuan Hu, Jianzhong (Charlie) Zhang, Farooq Khan
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2011
Where DCC
Authors Yichuan Hu, Jianzhong (Charlie) Zhang, Farooq Khan, Ying Li
