Sciweavers

DCC
1998
IEEE

Compression of Unicode Files

13 years 8 months ago
Compression of Unicode Files
The increasing importance of Unicode for text files, for example with Java and in some modern operating systems, implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. However it is not clear that data compressors designed for 8-bit byte data are well matched to 16-bit Unicode data. This paper investigates the compression of Unicode files, using a variety of established data compressors on a mix of genuine and artificial Unicode files. It is found that while Ziv-Lempel and unbounded context compressors work well, finite-context compressors are less satisfactory on Unicode. Tests with a simple special compressor intended for 16-bit data show that it may be useful to design compressors specifically for Unicode files.
Peter M. Fenwick, Simon Brierley
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1998
Where DCC
Authors Peter M. Fenwick, Simon Brierley
Comments (0)