|
LZW (Lempel-Ziv-Welch)
This is another dictionary-based compression. It works by entering symbol
into the dictionary together with the index then when the symbol occurs
again just output the dictionary index instead if the symbol. The original
LZW has only 12 bits (4096) dictionary index, the first 256 (0-255) indexs
are unuse, they are refer to individual bytes, and the rest indexs 256-4095
refer to inputted symbol.
Building dictionary rules:
1. The dictionary start with a sort value length then growing.
2. The dictionary can not has two same values.
Encoding:
Eg: source: ABCABCBACAACBAA (15 bytes).
The dictionary index value is start with 256, the index value of 0 - 255
is reserved for initial value.
| Input symbol |
Output symbol / index |
Dictionary index |
Dictionary value |
| AB |
A |
256 |
AB |
| C |
B |
257 |
BC |
| A |
C |
258 |
CA |
| BC |
256 |
259 |
ABC |
| B |
C |
260 |
CB |
| A |
B |
261 |
BA |
| C |
A |
262 |
AC |
| AA |
258 |
263 |
CAA |
| CB |
262 |
264 |
ACB |
| AA |
261 |
265 |
BAA |
| (End Of File) |
A |
|
|
The result (target) file only have the output symbol / index without
dictionary
Decoding:
While decoding the program will build dictionary as well.
| Input symbol / index |
Output symbol |
Dictionary index |
Dictionary value |
| AB |
A |
256 |
AB |
| C |
B |
257 |
BC |
| 256 |
C |
258 |
CA |
| C |
AB |
259 |
ABC |
| B |
C |
260 |
CB |
| A |
B |
261 |
BA |
| 258 |
A |
262 |
AC |
| 262 |
CA |
263 |
CAA |
| 261 |
AC |
264 |
ACB |
| A |
BA |
265 |
BAA |
| (End Of File) |
A |
|
|
Pros: do not need to store the dictionary in the output file.
Cons: dictionary have to be built in encoding and decoding (double work,
time comsuming), also difficult to get the best (the most efficient) dictionary
size on different file type and size.
Conclusion: this compression has so many variants and has so many names
on the web because people (programmer) like to improving this compression
by using a more efficient dictionary size and using different way of indexing
the dictionary.
|