                                The Art
                              of Lossless
                           Data Compression
                                vol. 17

Here are the results of tests performed in July 2000 to compare
lossless compression of english texts by all known good enough programs
developed for such purpose, including RK, DC, PPMDF, Bzip2, IMP, RAR and 7-zip.

See Archive Comparison Test by J.Gilchrist for more details:  http://ACT.BY.net

If anybody wants to start or continue such tests,
or can suggest some other sets of texts, or other compression programs,
 (not sources or algorithm descriptions, programs for DOS or Windows only)
or knows we have missed something important,
 (some new fantastic technology, an algorithm or even a program capable
 of lossless compression of up to 1000:1 etc.)
please let us know immediately: ratush@srsc-gw.sscc.ru   Thank you!


[[1]] COMPRESSION QUALITY
=========================
             (see also
             [[2]] Speed
             [[3]] Details
             [[4]] Comments)

Fifth line shows results for the sum of four Canterbury Corpus Large Set files,
tenth line - for the sum of all 556 files in five sets.


(modeling and ppm-based, slow-extracting programs)

original  RK   ppmonstr  PPMDF    BOA     ACB     777    UFA   Arhangel  UHARC
      -mx3-ft+ -o7-m56 -o7-m56   -m15      u  -m5-mu32-m5-mu32 -2-mm-mt -m3-mm

569.47%  100%   103.03  103.94  104.20  105.75  112.36  112.36  113.46  136.80
411.40% 100.03  101.95  101.98  100.56  102.85  100.50  100.50   100%   100.84
572.82% *100%   103.43  104.45  104.59  104.73  110.27  110.27  113.35  138.89
644.43% ^100%   106.03  107.28  110.29  109.01  124.88  124.88  136.57  134.41
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
521.41% *100%   103.06  103.73  103.68  104.76  108.93  108.93  111.32  123.80

486.73%  100%   102.67  104.25  105.02  107.57  112.71  112.71  114.30  133.35
398.62% ^100%   101.75  103.39  103.55  107.73  108.80  108.80  108.41  128.69
438.62%  100%   102.23  103.88  104.81  108.93  110.61  110.61  111.99  133.70
704.14%  100%   103.10  104.02  107.75  112.77  112.93  112.93  134.06  148.99
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
454.91%  100%   102.18  103.74  104.68  108.70  110.41  110.41  112.91  133.42


(dictionary-based and block-sorting, fast-extracting programs)

  DC      BA     ZZip    SZip     ERI    BZip2    IMP     RAR    7-zip   PkZip
-b16300 -k50-m -a4-b12 -o10b41    -m5    -k -9  -2 -s4  -m5-mm    -mx     -exx

102.63  107.29  110.25  108.91  109.94  118.98  117.30  135.80  156.39  165.00
101.46  103.86  102.41  103.83  106.17  110.95  109.09  112.46  111.08  115.52
100.82  105.20  108.7   109.36  107.74  118.50  116.25  138.68  158.53  166.77
108.77  108.37  109.43  113.00  110.32  127.55  125.74  138.57  181.35  187.54
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102.42  105.53  106.78  107.62  107.94  116.91  114.99  128.30  143.37  150.05

102.17  106.92  108.93  111.23  110.76  117.07  115.80  135.44  152.85  159.24
101.01  106.37  107.97  110.12  110.03  113.89  113.57  135.65  143.11  149.32
103.00  107.94  110.32  111.16  112.13  117.50  117.17  137.22  149.56  155.62
107.00  115.13  119.33  114.02  115.00  131.86  139.43  149.70  173.61  180.91
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102.79  107.96  110.27  110.97  111.66  117.74  117.87  137.34  149.92  156.13

* RK -mx2  (not  -mx3 -ft+ )
^ RK -mx3


[[2]] Speed
===========
Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip
was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98.

 Programs,options        Overall      Average      Compress Extract  Compressed
                          score,       Users'        time,   time,     size,
                                       score,       seconds seconds    bytes
                      seconds  %    seconds  %
777       a -m5 -mu32  1354   147%   1171   133%      203     222     3343996
777       a -mg -s     1880   205%   1262   144%      688     139     3793939
7zip      a            1307   142%   1232   140%       83       4     4393623
7zip      a -mx        1358   148%   1240   141%      131       4     4401160
acb       B            2540   276%   1818   207%      803     808     3346915
acb       b            2997   326%   2059   235%     1042    1047     3267480
acb       u            3802   414%   2496   285%     1452    1456     3221349
arhangel  a            1205   131%   1117   127%       98      94     3647060
arhangel  a -2  -mm    1203   131%   1117   127%       96      94     3647060
arhangel  a -2  -1     1514   165%   1148   131%      407      94     3647060
arhangel  a -mt        1173   127%   1069   122%      115     109     3417110
arhangel  a -mtf       1177   128%   1071   122%      118     110     3418181
ba       -k            1057   115%    988   112%       78      26     3432541
ba       -k -m         1057   115%    988   112%       78      26     3432541
ba       -k -1         1170   127%   1122   128%       54      26     3927264
ba       -k -10        1056   115%    986   112%       79      26     3424345
ba       -k -50        1046   114%    954   109%      103      17     3337823
boa    -m1             1623   176%   1387   158%      263     281     3886856
boa    -a              1560   170%   1266   144%      327     340     3217347
boa    -m15            1588   173%   1277   145%      346     358     3182732
bzip2    -k            1075   117%   1025   117%       56      16     3611558
bzip2    -k -s         1145   124%   1102   125%       48      14     3902513
bzip2    -k -1         1201   130%   1159   132%       47      13     4109767
bzip2    -k -5         1089   118%   1046   119%       48      14     3697142
bzip2    -k -9         1070   116%   1023   116%       53      15     3611558
dc        e             950   103%    918   104%       36      22     3214240
dc        e -a          950   103%    921   105%       33      23     3223329
dc        e -d         3567   388%   3547   405%       24       2    12751141
dc        e -b16300    1098   119%    875   100%      248      64     2829394
eri       a -m1        1119   122%    983   112%      153      29     3378440
eri       a -m2        1117   121%    975   111%      158      30     3346586
eri       a -m3        1123   122%    971   110%      169      32     3318853
eri       a            1136   123%    972   111%      183      33     3313568
eri       a -m5        1167   127%    975   111%      215      33     3313559
imp98     a -2         1043   113%   1002   114%       46      11     3547964
imp98     a -2  -s4    1040   113%    998   114%       48      11     3535351
imp       a -2  -s4    1041   113%   1001   114%       45      11     3548156
pkzip  -es             1659   180%   1655   189%        5       3     5945608
pkzip  -a              1326   144%   1307   149%       22       2     4691477
pkzip  -exx            1498   163%   1303   148%      217       2     4605928
ppmd      e -o5         958   104%    937   107%       24      23     3279292
ppmd      e -o7         983   107%    953   108%       34      34     3296502
ppmd      e -o9        1057   115%   1015   116%       47      48     3464715
ppmd      e -o5 -m56    950   103%    932   106%       20      23     3268214
ppmd      e -o7 -m56    917   100%    893   102%       28      30     3095512
ppmd      e -o9 -m56    985   107%    944   107%       46      46     3215327
ppmonstr  e -o5         997   108%    958   109%       43      43     3278191
ppmonstr  e -o7        1023   111%    972   111%       57      59     3265897
ppmonstr  e -o9        1097   119%   1031   117%       74      78     3406265
ppmonstr  e -o5 -m56    989   107%    954   109%       40      42     3268306
ppmonstr  e -o7 -m56    965   105%    918   104%       53      56     3083063
ppmonstr  e -o9 -m56   1036   112%    967   110%       77      77     3178172
rar       a            1226   133%   1134   129%      103       4     4029077
rar       a -mm        1227   133%   1134   129%      105       4     4029077
rar       a -m1        1247   135%   1205   137%       48       4     4304853
rar       a -m5        1555   169%   1144   130%      457       4     3938348
rar       a -s         1227   133%   1134   129%      104       4     4028163
rar       a -s  -mda   1307   142%   1236   141%       79       4     4408220
rar       a -s  -mdc   1252   136%   1168   133%       93       4     4157251
rar       a -s  -m5    1560   170%   1144   130%      463       4     3937052
rar32     a -s  -m5    1560   170%   1144   130%      463       4     3937052
rk     -mf1            1194   130%   1166   133%       32      21     4110184
rk     -mf2            1308   142%   1149   131%      177      76     3798456
rk     -mf3            1504   164%   1151   131%      392      72     3742232
rk     -mx1            1736   189%   1350   154%      430     449     3089384
rk     -mx2            1825   199%   1403   160%      470     502     3074900
rk     -mx2 -ft+       1915   208%   1452   165%      514     540     3099400
rk     -mx2 -fe+       1844   201%   1413   161%      480     510     3074904
rk     -mx3            1891   206%   1440   164%      502     535     3076136
szip    -v0            1040   113%   1003   114%       41      34     3473957
szip    -o4            1061   115%   1044   119%       19      29     3646906
szip    -o8            1040   113%    993   113%       53      35     3429112
szip    -o0            1063   115%    979   111%       94      24     3403202
szip    -v0 -b41       1019   111%    984   112%       39      34     3405120
szip    -o4 -b41       1045   113%   1029   117%       17      30     3591824
szip    -o8 -b41       1021   111%    974   111%       53      36     3356744
szip    -o0 -b41       1055   115%    959   109%      107      24     3326271
ufa     a   -m5 -mu32  1378   150%   1185   135%      216     234     3343996
ufa     a   -mg -mu32  1381   150%   1185   135%      219     234     3343996
ufa     a   -m5 -mu16  1323   144%   1156   132%      186     203     3363895
ufa     a   -m5 -mu10  1312   143%   1154   131%      177     195     3387619
ufa     a   -m5 -mu4   1342   146%   1187   135%      173     192     3519553
ufa     a   -mg -s     1630   177%   1161   132%      522      28     3889878
uharc   a              1381   150%   1183   135%      220      27     4081072
uharc   a -m1          1354   147%   1244   142%      122      29     4333271
uharc   a -m3          1514   165%   1125   128%      432      26     3801399
uharc   a -m3 -mm      1515   165%   1126   128%      433      26     3801399
uharc   a -m3 -md64    1501   163%   1221   139%      311      28     4184881
uharc   a -m3 -md2048  1515   165%   1126   128%      433      26     3801399
zzip    a              1085   118%   1030   117%       62      28     3584447
zzip    a   -mm        1085   118%   1030   117%       61      28     3584447
zzip    a   -lm        1085   118%   1030   117%       61      28     3584447
zzip    a   -a1        1085   118%   1030   117%       61      28     3584447
zzip    a   -a2        1080   117%   1021   116%       66      31     3543392
zzip    a   -a3        1076   117%   1014   115%       69      30     3517619
zzip    a   -a4        1085   118%   1015   116%       79      30     3517619
zzip    a   -a4 -b12   1029   112%    950   108%       88      31     3277976

Overall score is calculated by adding compression time, extraction time, and
time it would take to transfer the compressed file over a 28,800bps network:
(compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second

Average Users' score is calculated by adding (compress_time/10)+ extract_time +
time it would take to transfer the compressed file over a 28,800bps network.
Compression time is divided by 10 here, because more than 90% of people would
never compress anything during their life (with compression programs), but they
use compressed data almost _every_ time they use computers and/or Internet.
That's why compression time is not so actual for them.


[[3]] Details
=============
are no longer put to this main text
(738 lines reporting 22796 results on 556 files in 5 sets),
but can be found in FULL version with TEXTS.DAT and *.BAT
at http://geocities.com/SiliconValley/Bay/1995/artest17.zip
or http://artest1.tripod.com/artest17.zip


[[4]] Comments
==============
Links to download programs:
~~~~~~~~~~~~~~~~~~~~~~~~~~~
7-Zip  2.11   :W http://www.7-zip.com/dl/7zip211.exe                              493K
777 0.04b1    :W http://www.7-zip.com/dl/ufa/777004b1.zip                          72K
UFA 0.04b1    :W http://www.7-zip.com/dl/ufa/ufa004b1.zip                          64K
ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip            50K
ERI32  4.6fre :e http://geocities.com/eri32/eri46fre.zip                           91K
Imp     1.1   :e http://www.winimp.com/imp110d.zip                                266K
Imp-win 1.12  :W http://www.winimp.com/imp112.exe                                 122K
PkZip   2.50  :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe     202K
RK     1.02a5 :W http://malcolmt.tripod.com/downloads/rk102a05.exe                191K
RAR32  2.71   :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe               257K
WinRAR 2.71   :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe               588K
PPMD var.F ,
PPmonstr v.F  :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip       97K
ACB 2.00c     :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip    42K
BOA 0.58b     :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip                        74K
DC 0.98b      :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip                         55K
BA 1.00 beta  :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip                        60K
Bzip2 1.0.1   :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K
SZip 1.12a    :W http://www.compressconsult.com/szip/szip_112a_win32.zip           71K
ZZip 0.35a    :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip                 28K

:a - any DOS  - DOS programs, will run under pure DOS or in a DOS box
:e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI
:W - windoze  - Windows95/98/NT/etc programs

If direct link doesn't work-most probably newer version of the program appeared
at the same site: visit web page, or read the whole directory from ftp server
(i.e. try the same URL, but without filename).


Homepages:
~~~~~~~~~~
Arhangel     : http://geocities.com/SiliconValley/Lab/6606
Eri32        : http://geocities.com/eri32
      mirror : http://artest1.tripod.com
RK           : http://malcolmt.tripod.com
Imp,WinImp   : http://www.technelysium.com.au
      mirror : http://www.winimp.com
PkZip        : http://www.pkware.com
Ufa,777,7-Zip: http://www.7-zip.com
RAR,WinRAR   : http://www.rarsoft.com
BZip2        : http://sources.redhat.com/bzip2
SZip         : http://www.compressconsult.com/szip
ZZip         : http://www.via.ecp.fr/~damien/zzip


What's new:
~~~~~~~~~~~
All contents of this file.

407 Megabytes of plain (english) texts in 556 files in 5 sets,
including the four Canterbury Corpus Large Set files.
Non-english texts will probably be added in future,
but don't expect that results will differ more than 1%.

One file (pgwht04.txt) is an html file,
and one (E.TXT, originally E.COLI), the first of Large Set - pseudo-text.

19 archivers and file-to-file compressors,
known to be best in plain texts compression (plus few most popular tools).

.BAT files used for tests are more compact and readable - see TEXT_ALL\*.BAT
inside artest17.zip, and .BATs used for calculations are also added this time.

DOS prompt calculator with user def. functions
(math.exe being used for ARTest) can be found at
ftp://ftp.simtel.net/pub/simtelnet/msdos/calculte/mathfc24.zip (26K)

Ultra Precision Command Timer 1.6 - Freeware (C) 1993 by Erik de Neve
(upct.exe being used for ARTest) can be found at
ftp://ftp.cdrom.com/.3/sac/utilmisc/upct16.zip (7K)

MultiEdit 7.00jP-386 was used for files editing with macrocommands, blocks etc,
and standard fc.exe from any DOS/Windows package - for comparing files.


WARNINGS:
~~~~~~~~~
RK 1.02a5 was unable to correctly decompress CHNBG10.TXT compressed with any
-mx1,-mx2, -mx3
("This program has performed an illegal operation and will be shut down"),
and also MISCC10.TXT with -ft+ and any of -mx1,-mx2,-mx3, reporting

ERROR 303: CRC check failed.

BA 1.00beta can't decompress any file compressed with -mf , and says nothing
like "CRC fails"

DC 0.98b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc ,
saying "Corrupted block" (while t(est) command writes "Test successful").

UFA and 777 can't handle files with symbol ` (ASCII code 96) in their names.
It was replaced with _ in nine filenames.

ERI32 4.6 can't compress files larger than (free DPMI memory)/6 , i.e.
about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks
9000000 bytes long (last chunk was 8894190 bytes).


The LATEST RELEASE, and thirteen previous versions of these tests can be found
at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/



The FINAL PART
==============
>     [[5]] PLEASE read THIS before replying to this article
was removed from this text, but can be easily found at
http://geocities.com/SiliconValley/Bay/1995/artest10.html
http://artest1.tripod.com/artest10.html

Send your suggestions, comments to ratush@srsc-gw.sscc.ru
With best kind regards,
RAO Inc.
