I have been working on generalizations of Kuznetsov-Tsybakov problem, which has two distinct connections with data compression - I thought it might be interesting here.
The first connection is like in the title. In standard entropy coding we want to encode symbol sequence into a bit sequence. By reversed entropy coding I mean that we have a message stored in a bit sequence, and we want to encode it as a symbol sequence having some chosen probability distribution - it can be done by just switching standard entropy encoder with decoder ... but it would require that decoder needs to know the used probabilities - surprisingly it turns out we can omit this requirement here! (at cost paid by encoder)
Some applications of such "reversed entropy coding":
- hiding information in sequence of chosen statistics for various steganographic problems, like to create picture resembling QR-like codes below: grayness of pixel becomes probability of using "1" there,
- storing information in a medium which enforces some statistical constraints. For example imagine that HDD stores information by magnetizing dots having circular shape - let us reduce the lattice constant sqrt(2) times, enforcing constraint that no two neighboring nodes are simultaneously magnetized (no neighboring "1"s like in Fibonacci coding) - it turns out that we can increase informational capacity of this medium by 17% this way (it was the original motivation for asymmetric numeral system).
So let us see that we can do this reversed entropy coding with decoder not knowing the used probability distribution. For strong constraints it is so called Kuznetsov-Tsybakov problem: some bits are fixed to 0 or 1 and we can use the remaining ones. If the decoder knows positions of the fixed bits, we could just use the remaining bits. However, it turns out that without this knowledge, we can still approach the same capacity.
How to do it? We can use the freedom of choosing the exact encoding sequence: imagine M is length m bit sequence of our message and F is a f length bit sequence which can be freely chosen. We will also need some large pseudo-random (deterministic) bijection for length n=m+f bit sequences - transition function T.
Now encoding for our message M is
T(M,F) length n=m+f bit sequence
where F is chosen to fulfill the constraints - among 2^f different ways (pseudorandom sequences) to encode our message, we choose the one most fitting to our constraints (like resembling the picture).
So imagine that we would like to encode as Pr(1)=p probability sequence (reversed entropy coding) this way. Probability that a completely random bit sequence will accidentally have this distribution is
binomial(n,pn)/2^n ~ 2^(-n(1-h(p)))
where h(p)=-plg(p)-(1-p)lg(1-p) is Shannon entropy. So if we would check more than 2^(n(1-h(p))) different random sequences encoding our message, almost certainly there would be a satisfying sequence there: we should use f>n(1-h(p)), what is exactly the capacity limit for entropy coding, but this time it does not require that decoder knows the probability - he just decodes and discards the freedom bits.
This abstract scheme is impractical, but we can make good approximations by growing tree of possible encoding and making that only some number of the best encodings survives.
The second connection with data compression is that this problem is kind of dual to the rate distortion problem, which is the base of lossy compression.
So imagine that again we look for F such that T(M,F) is the best fitting to our constraints (resembling our data), but this time we fix M=0 and send F to the receiver - this F is shorter than the data, and allows to reconstruct its distorted version.
From this perspective, we can encode ("lossy compress") pictures below using (1-rate), like 1/8 bits per pixel for the bottom left one.
We can see the informational content of these pictures as:
"hidden information" + "constraints/visual information" = 1bit/pixel
Better explanation and pictures are in slides and preprint.
I have recently implemented some very basic version for these QR-like picture resembling codes (without dithering, error correction), here are some examples:
I thought about standard entropy coding with receiver not knowing the probabilities for data compression - it would allow to use better statistical models than current ones: basing only on already processed sequence, but I don't see how to do it - have anybody seen something like that?
What other applications of this kind of considerations could you think of?
Can we do it smarter/faster than just brute force looking through thousands/millions of different encodings?