LZW (Lempel-Ziv-Welch)

This is another dictionary-based compression. It works by entering symbol into the dictionary together with the index then when the symbol occurs again just output the dictionary index instead if the symbol. The original LZW has only 12 bits (4096) dictionary index, the first 256 (0-255) indexs are unuse, they are refer to individual bytes, and the rest indexs 256-4095 refer to inputted symbol.
Building dictionary rules:
1. The dictionary start with a sort value length then growing.
2. The dictionary can not has two same values.

Encoding:
Eg: source: ABCABCBACAACBAA (15 bytes).
The dictionary index value is start with 256, the index value of 0 - 255 is reserved for initial value.

Input symbol Output symbol / index Dictionary index Dictionary value
AB A 256 AB
C B 257 BC
A C 258 CA
BC 256 259 ABC
B C 260 CB
A B 261 BA
C A 262 AC
AA 258 263 CAA
CB 262 264 ACB
AA 261 265 BAA
(End Of File) A    

The result (target) file only have the output symbol / index without dictionary

Decoding:

While decoding the program will build dictionary as well.

Input symbol / index Output symbol Dictionary index Dictionary value
AB A 256 AB
C B 257 BC
256 C 258 CA
C AB 259 ABC
B C 260 CB
A B 261 BA
258 A 262 AC
262 CA 263 CAA
261 AC 264 ACB
A BA 265 BAA
(End Of File) A    

Pros: do not need to store the dictionary in the output file.
Cons: dictionary have to be built in encoding and decoding (double work, time comsuming), also difficult to get the best (the most efficient) dictionary size on different file type and size.
Conclusion: this compression has so many variants and has so many names on the web because people (programmer) like to improving this compression by using a more efficient dictionary size and using different way of indexing the dictionary.

Sliding Window

 


Author Site Map Disclaimer
HMaxF Ultimate Recursive Lossless Compression Research
2001 - 2002 (c) All Rights Reserved.
Hosted by www.Geocities.ws

1