The Carryless Rangecoder
------------------------

This coder is based on the carryless rangecoder made by Dmitry Subbotin.
He had made it as a C++ class which I believe is in the public domain.
I translated it to C by using structures, two for the coder and one for
the model. I also tried to keep simplicity in mind so that no knowledge
of how it actually works is needed.

I got inspired by the arithmetic coder version made by Fred Wheeler.
I thought it was very convenient to be able to have several models,
even several coders, in use at the same time.

This can be very useful when e.g. compressing Burrows & Wheeler
Transform output. I'm sure it can be useful in other cases too.
It is a lot faster than the old arithmetic coder and I have not been
able to observe any significant difference in compression result.
You can control the result some by changing the rescaling interval by
changing the maxFreq variable (when calling the ModelInit function).

How to use the rangecoder
-------------------------

You first have to initiate your models.
status = ModelInit(model_ptr, nr_of_syms, freq_table, freq_table_init, increment, maxFreq, ADAPT or STATIC)
where status is RC_OK or RC_ERROR, totFreq and maxFreq aren't allowed to be
greater than 1 << 16 or 65536.
The freq_table_init is simply an additional array with frequencies
to initiate freq_table with. If freq_table_init (pointer) is NULL are then
the freq_table's symbolcounts all initiated to 1.
Increment and maxFreq can with advantage be experimented with, e.g. the order 1
coder needs faster adaptation so I gave it a high increment, 32.
If you set the last parameter to STATIC, will the frequency table not be updated.

You can choose to compress/decompress to/from memory or a file.
StartEncode(encoder_ptr, file_ptr, NULL, 0)
or StartEncode(encoder_ptr, NULL, mem_ptr, mem_length).

Likewise StartDecode (decoder_ptr, file_ptr, NULL, 0)
or StartDecode (decoder_ptr, NULL, mem_ptr, mem_length).

Compressing to or decompressing from memory is the default if filepointer
and memorypointer are both set.

You then compress/decompress the data in a loop.
Compress:
int status = EncodeSymbol(encoder_ptr, model_ptr, symbol)
or decompress:
int symbol_or_status = DecodeSymbol(decoder_ptr, model_ptr).

After you hopefully have compressed all the data, you have to finally call
this function (only after compression):
FinishEncode (encoder_ptr).
--

15/10 - 2001
In this update have I optimized the symbol/cumFreq search in EncodeSymbol and
DecodeSymbol. Redundant calculations have been removed, especially in DecodeSymbol.
The lastSym and lastCumFreq variables are only set if the last walk wasn't from
the start or the end of the range.
Two programs, rcenc and rcdec, have been added that compress and decompress
files to files.
--

17/10 - 2001

Just bugfixes. ModelInit now return RC_ERROR if maxFreq or totFreq > 1 << 16 (= 65536).
--

23/10 - 2001

The number of iterations when searching for a symbol in DecodeSymbol is greatly reduced.
E.g. on text are the number of iterations roughly the same in DecodeSymbol and EncodeSymbol.
The decoding routine is much faster now, however not so simple anymore.

Counting the number of sums (iterations), average per symbol:
bib:   compression 15, decompression 16
book1: compression 14, decompression 15
geo:   compression 27, decompression 28

i.e. roughly the same as Fenwick's BIT and Moffat's Heap
but much faster on e.g. BWT+MTF output which usually needs 1-2 iterations per symbol.
--

29/10 - 2001

Minor enhancements. Added an order 1 coder and increment to the rc_model structure.
--

10/11 - 2001

Minor compatibility enhancements.
--

17/11 - 2001

Modified the order-1 coder to use more models and thus compress better.
It's now almost order-2 with an added trick for text-like data.
--

13/1 - 2002

Improved rcenc1 and rcdec1 coders.
--

If you find it useful, please send me an email!

2 October 2001, Mikael Lundqvist
mikael.lundqvist@spray.se or mlq@telia.com
