Okay! Currently I'm playing with hashing and binary tree.
Check out some testing results (DEBUG build of lzpm 0.03)
<hash size, bits>-<number of hash bins>: <compressed size> <time>
TraktorDJStudio3.exe (29,124,024 bytes)
18-32: 5,950,354 bytes, 8.844 sec
17-64: 5,936,534 bytes, 13.047 sec
16-128: 5,929,646 bytes, 20.094 sec
world95.txt (2,988,578 bytes)
18-32: 626,067 bytes, 0.969 sec
17-64: 621,738 bytes, 1.250 sec
16-128: 619,408 bytes, 1.671 sec
worms2.tar (17,097,728 bytes)
18-32: 8,720,491 bytes, 15.937 sec
17-64: 8,718,574 bytes, 25.344 sec
16-128: 8,717,675 bytes, 43.125 sec
3200.txt (16,013,962 bytes)
18-32: 5,179,135 bytes, 7.968 sec
17-64: 5,136,275 bytes, 11.125 sec
16-128: 5,119,991 bytes, 16.437 sec
Note that current lzpm 0.02 uses 16-bit hash and 128 hash bins - now you can see how we can speed-up the compression!
Anyway, I'm seriously thinking about binary tree - it must be more efficient than hashing in this case.
The hashing has the worst case - when we match to each byte (compressed or not so well compressible data). In addition, with ROLZ we must perform a larger number of match searches compared to the LZ77. Binary tree with lzpm should work better...
Also I'm thinking to myself, how Malcolm perform the same match search in his ROLZ2/ROLZ3. Playing with WinRK I've found that:
The actual size of additional memory requirements for compression is depent upon:
+ Dictionary size
+ Largest Optimised Match (0 or 1 and larger)
All clear with dictionary size.
About Largest Optimised Match:
0 - disables optimizing (Optimal Parsing)
values of 1 and higher enables.
With Optimal Parsing enabled, ROLZ uses a larger amount of memory. Additionally to the significant speed impact...