# Thread: Statistical implementation of Ziv-Lempel

1. ## Statistical implementation of Ziv-Lempel

Hello,

as a part of my diploma thesis I want to implement an LZ-based compression scheme. I need it to be statistical, i.e. it does not itself compress, but returns a probability for the next symbol. Also the implementation should work bitwise. (Since I am going to compress text, the compression scheme could internally work in a bytewise manner, but I need a prediction for every new bit.) 100 MB of memory should be sufficient.

There is a whole lot of different implementations both for LZ77 and LZ78. I dont know which is better. Could anyone suggest me a scheme, that is appropriate for my task? Maybe you can tell me the title of a paper for this scheme.

Thomas

2. It kinda sounds like what I recently mentioned in
http://encode.ru/forum/showpost.php?p=3831&postcount=27

http://en.wikipedia.org/wiki/Lempel-Ziv

LZ78 might be easier for the task, as you only have to
consider the available symbols, and you can control the
alphabet size, but the only well-known implementation is
LZW, and its compression is relatively bad.

So I guess it should be more interesting to implement such
a thing for some simple existing LZ77 - like
http://dfn.dl.sourceforge.net/source...lz-1.15.tar.gz
Of course, things like deflate and LZMA are more popular, but
it'd be probably too hard to deal with their sources,
and there's a lot of unnecessary (here) features - like
blocks in deflate.

Anyway, LZ77 just codes the references to substrings in
the already processed data, so its idea is really simple.
And you won't be able to use the existing library for
such task either way, so it might be better to just write

So basically, with LZ77 you'd have a set of literals +
all possible matches, and it'd be necessary to estimate
their code lengths (=log2 of probability) and sort by
first bit value, convert to probabilities, integrate and normalize.

Which is not very hard to implement in a direct way,
but then making it to run reasonably fast might be
a tough problem.

But still, even a direct implementation would be interesting
to compare with straight version of its LZ model, and estimate
the LZ's redundancy.

3. The following links might also be interesting for you:

http://en.wikipedia.org/wiki/Reduced_Offset_Lempel_Ziv

can read german or russian, the following might be more interesting:

http://de.wikipedia.org/wiki/Reduced_Offset_Lempel_Ziv
http://ru.wikipedia.org/wiki/ROLZ

Maybe you can use google translate.)

Finally,

is another implementation of Ilia.

Best,
Alibert

4. Better check out BALZ! BALZ is an improved QUAD and its source is in the Public Domain!

http://balz.sourceforge.net/

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•