Given a fixed, arbitrary set of s alphabets, {a1,...,as}. There exists a method to encode/transmit any string of length k in at most s log k + log (k!/k1!k2!...!ks!) bits, where ki is the number of occurences of ai within the string. (Note: we will furthermore show that log (k!/k1!k2!...!ks!) ∼ -k ∑i (ki/k) log (ki/k).)
This is the encoding/decoding scheme:
First the numbers
k1,...,ks are communicated.
This takes s log k bits.
Now there are only
Now using Stirling's approximation (i.e. k!∼ (2πk)1/2(k/e)k => log k!∼ k log k), we have
The formula -∑i pi log pi, where pi=ki/k also has meaning when using encoding/decoding schema based on encoding alphabet (instead of the indexing method described).
These encoding schema encodes each letter ai into a binary code and then transmit the string by these codes. Since we want these codes to be self-delimiting (so that we do not waste a letter as delimiter) we require the binary codes to be prefix-free.
We can show that such prefix-free codes correspond to paths to leaf nodes in a binary tree. Furthermore, the paths to a parent of any used leaf node cannot be used, since it would be the prefix of the path to the leaf node. In the case that every leaf node is used to represent a path, we call the resultant code complete. It can be verified that for any complete code {s1,s2,...,sn}, ∑i 2-|si|=1. This gives us the intuition to prove that: there exists a set of prefix codes {s1,s2,...,sn} of lengths l1,l2,...,ln if and only if ∑i 2-li ≤ 1 (Kraft's inequality).
Normalized metric: a normalized metric is a distance measure d:UxU->
We now define the notion of an optimal prefix-free code. Let A={a1,...,an} be an alphabet where the probability for each letter ai to appear is pi (i.e. ai occurs pi k times in a string of length k). A prefix-free code is called optimal with respect to A just in case it has the least ∑i pi |si| among all prefix-free codes.
Let H(A)=-∑i pi log pi. The Noiseless Coding Theorem states that the average length L=∑ipi|si| of the optimal code {s1,...,sn} has H(A) ≤ L≤ H(A)+1.
Note that completeness does not guarantee a prefix-free codes to achieve this optimality. However, the Huffman codes, which is prefix-free, does.
Also, while the Huffman coding encodes to the shortest average length among all prefix-free codes, other methods --devised under other assumptions-- may result in shorter lengths.
The plain Kolmogorov complexity of a string x computed by a function φ,
l(x) stands for the length of x. x can be a number or a string. In the case that x is a number, l(x) = log x + O(1).
We fix a universal Turing machine Φ which on input p deconstruct p into two parts <i,j>, and then runs Ti(j). Let
For each n there are 2n strings of length n, but there are only ∑0≤i≤ n-c-12i = 2n-c-1 descriptors of lengths up to n-c-1. (Note that the strings of descriptors of lengths ≥ n-c are c-incompressible.)
Hence for any n and c ≤ n, there are 2n - 2n-c + 1 c-incompressible strings.
We generalize this observation to:
For each fixed y, every finite set
A of cardinality m has
at least m(1 - 2-c) + 1 elements
x with C(x|y) > log m - c.
Let x = uvw. Suppose p is a short program for v.
Now uvw can be reconstructed from the specification
q = l(p) p l(u) u w, by say some machine
T.
Hence,
Noting that l(C(v)) and l(n) are of the order of log n,
Suppose x is c-incompressible. That is, C(x) ≥ n - c. We have
Hence v is incompressible to an extent of O(log n). This cannot be improved because if v is incompressible then it implies that it is very irregular. However, if x has only irregular substrings then x can be only of certain forms, making it easier to specify (and hence compressible). Hence it is inevitable that some substrings of x be compressible.
This c is determined by the index of the machine which does its computation twice: once with the input and once with the output of the input as input. (The proof of this result should be straight-forward from here.)
To see that, let n = 2k for some k. Then C(1n) ≤ log log n + O(1). However, there are n strings of length m ≤ n. Hence there exists one such string of complexity C(1m) ≥ log n + O(1) (see section on incompressibility).
One may try to overcome this problem by giving, for free, the length of the input string --- that is, consider instead, the length-conditional complexity C( x | l(x) ). However, even this measure is not monotonic on prefixes. To see this, we use strings where their lengths give away all the information there is to know about the string. Let sn = n0n-l(sn). It is clear that for any n, sn can be reconstructed completely from n, and hence C(sn | n) ≤ c, where c is decided by the index of the machine that does the reconstruction of sn from n. Now note that for any string there exists a superstring of it that is sn for some n, and we are done.
If A is meager and recursive, then for every constant c there are only finitely many x in A that are c-incompressible!
We can similarly show that elements of meager r.e. sets are highly compressible, as follows.
Let A be an r.e. set of elements of the form (x,y) (intuitively, consider y the length of x. We keep this result as general as possible to introduce the randomness deficiency in the next section). If for each natural number y, Ay = {x | (x,y) ∈ A} is finite, then for some constant c depending only on A, for all x ∈ Ay, we have (proof omitted)
Let y be the length of the string x in the result, and suppose d(A≤ n) ≤ p(n) for some polynomial p (that is, A is meager). Hence by the result, C(x|n) ≤ l(p(n)) + O(1) = O(log n). However, for all x of length at most n it is clear that C(x) ≤ C(x|n) + 2 l(n) + O(1). This gives us that for any member x of length n from a meager r.e. set, C(x)=O(log n).
If δ(x|A) is large, this means that there is a description of x with the help of A that is considerably shorter than just giving x's "serial number" in A. Intutively, a large δ(x|A) implies that x is not so random with respect to A. We may consider x to be random in the set A iff δ(x|A)=O(1).
An important case is when the probability distribution P corresponds to the uniform distribution. That is, P(x)=2-2 l(x). In this case, the second condition for the Martin-Lof test becomes:
It turns out that for any P,
all the P-tests can be effectively enumerated, and this
allows the definition of a universal P-test:
Let δ1,δ2,... be an enumeration
of all P-tests, then
P0(x|P) = max{δy(x) - y | y ≥ 1}
is a universal P-test.
Interestingly, we can also show f(x) = l(x) - C(x|l(x)) - 1 to be a universal L-test.