SFC Test
option [ex]
13.077.423 B comp. 169,703 s. dec. 2,687 s.
option [e]
13.278.748 B comp. 68,060 s. dec. 2,705 s.
OK, a very special version of BALZ is here!
Briefly what's new:
+ Enlarged window size to 4 MB, block size to 32 MB
+ Improved match finder
+ Improved parsing. Default "e" mode uses greedy parsing. An optimized "ex" mode uses an advanced lazy matching with two byte lookahead. During parsing, encoder checks some additional conditions like is current offset in Rep() state, is current offset good enough, etc.
Enjoy!
http://encode.ru/balz/index.htm
![]()
SFC Test
option [ex]
13.077.423 B comp. 169,703 s. dec. 2,687 s.
option [e]
13.278.748 B comp. 68,060 s. dec. 2,705 s.
Thanks Ilia!
Mirror: Download
I/you/they have remained really interested Ilia by this BALZ - LZ77 compressor. Go down under the 100.000.000 Bs in Maximum compression for MFC test is not what from not too long! I am curious to know if you/he/she has inserted a pre-filter delta for BMP, TIFF, WAVE etc.?
Nope, BALZ has exactly the same E8/E9 transformer as LZPM (an improved QUAD's one) and only. Just BALZ is much stronger on binary data, thanks to LZ77! I'm hoping that new BALZ v1.04 will have MUCH higher compression on ALL test sets, including MFC, Squeeze Chart, Black_Fox's and of course yours!![]()
Quick test...
BALZ [e]
A10.jpg > 843,382
AcroRd32.exe > 1,473,688
english.dic > 872,448
FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261
world95.txt > 233,472
Total = 12,650,532 bytes
ENWIK8 > 30,279,021 bytes
Elapsed Time: 00:45:14.517 (2714.517 Seconds)
BALZ [ex]
A10.jpg > 843,382
AcroRd32.exe > 1,449,276
english.dic > 962,560
FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981
Total = 12,946,345 bytes
ENWIK8 > 29,230,841 bytes
Elapsed Time: 02:04:27.840 (7467.840 Seconds)
LovePimple
Re-check the results, there is something wrong!![]()
Results are correct for my machine. It seems that BALZ still fails to work correctly on my old P3 @750MHz machine.![]()
Here are the results from the same test on my AMD Sempron 2400+ machine...
BALZ [e]
A10.jpg > 843,382
AcroRd32.exe > 1,473,688
english.dic > 1,095,449
FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261
world95.txt > 638,687
Total = 13,278,748 bytes
BALZ [ex]
A10.jpg > 843,382
AcroRd32.exe > 1,449,276
english.dic > 1,093,638
FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981
Total = 13,077,423 bytes
Compression of ENWIK8 is far too slow to keep retesting.
EDIT: Here are the results for the fastest [e] setting...
ENWIK8 > 30,279,021 bytes
Elapsed Time: 00:42:10.449 (2530.449 Seconds)
We had this problem before.
http://encode.ru/forums/index.php?ac...pic=649&page=0
Yep! Its a compiler-related problem... Anyway, BALZ is for modern PCs. P3 is for museums. For example, some time ago I get myself to the PC center to get a RAM for my sampler, the RAM type is equal to an old laptops type. Sellers said that such RAM type is from P3 era and should be placed at museum, finally Ive found one chip and purchase it at very high price - because its a museum-like, very rare RAM chip...Originally Posted by LovePimple
I just dont know whats wrong...And since it works on ALL other machines, I still think its OK...
As always, you may play with Visual Studio compile:
balz104cl.zip
![]()
Some testing results:
textures.tar (Textures from the Doom 3 game, 604,218,368 bytes)
PKZIP 2.50, -exx: 233,240,852 bytes
TOR 0.4, -5: 216,888,608 bytes
TOR 0.4, -11: 210,794,900 bytes
CABARC 1.00, -m LZX:21: 193,234,553 bytes
LZPM 0.15, 1: 187,609,193 bytes
LZPM 0.15, 9: 185,038,816 bytes
BALZ 1.04, e: 184,031,123 bytes
BALZ 1.04, ex: 183,003,496 bytes
![]()
I dont agree. Just because something (or someone) is old, we should not dismiss them as "museum" pieces.Originally Posted by encode
Thank You!Originally Posted by encode
![]()
BALZ 1.04 is now added to SFC and MFC tests.
http://www.maximumcompression.com/
However, new version has lower compression even with 4 MB dictionary. Note, BALZ v1.03 has MINMATCH=3, BALZ v1.04 has MINMATCH=4. Newer version looks like loose too many short matches. Maybe BALZ v1.05 will have 1 MB dictionary, MINMATCH=3, and improved LZ-output coding.![]()
in order to make compression fast and good, you need to use separate hash tables for short strings. say, tornado uses 3 separate tables: for 2-byte, 3-byte and 4+-byte strings. first two tables are rather large and addressed directly, without chains. lzma 4.43 used the same scheme and current versions uses separate table for 4-byte strings and last table only for 5+-byte strings. the same is true for rar. note that size of table should be much larger than max. distance for this type of matches. say, lzma uses one million entries for searchoing 4-byte strings while distances are probably limited to something about 50-200kb
Yep, I will try to implement such multi-level hashing in BALZ.![]()
Just carefully tested such thing with BALZ. Well, it's works! However, I'll not hurry to add it.
<div class="jscript"><pre>
int pos=head[HSIZE+gethash3(i)];
if (pos) {
// search for short string
}
pos=head[gethash4(i)];
while (pos) {
// do a hash chained search
pos=prev[pos];
}
// ...
head[HSIZE+gethash3(i)]=i;
int h=gethash4(i)
prev[i]=head[h];
head[h]=i;
// ...
</pre></div>
Even with large HSIZE match finder finds not all short strings. At the same time such thing may do slightly deeper search - since we limit a hash chain length to 8192, and 4 byte hash is better than 3 byte. Will do more tests...![]()
Another cool idea, which works, is to match MINMATCH from Rep() (recent offset) only. In this case we may encode MINMATCH WITHOUT offset, also MINMATCH freely can be even 2.![]()
HSIZE is size of 4-byte hash hereOriginally Posted by encode
as i said before, with 3-byte strings whose offsets are limited to 4096, lzma used 64k entries
btw, Kadach wrote that its better to check repdists first - before performing search in hashtables. look at lzma for implementation details
...and of 3-byte hash as well...Originally Posted by Bulat Ziganshin
Will look again at Kadach.Originally Posted by Bulat Ziganshin
![]()
New BALZ v1.05 comes out! What's new:
+ New match finder: HC5 - i.e. hash chains with 3-5-byte hashing!
+ Slightly improved parsing
All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!
It will be released within one or two weeks...![]()
Although, S&S parsing rules! Many times I compared various parsing schemes with S&S, S&S is the best - even with smaller dictionary it achieves higher compression than, say 2-byte lookahead lazy matching. Maybe I should combine new match finder (HC5) with such parsing...![]()
Originally Posted by encode
![]()
Tested the idea...Originally Posted by encode
Well, even such match finder is extremely slow with SS parsing. That means that with this kind of parsing we should use binary tree or similar stuff, maybe we may build a tree, like with some LZW implementations, and instead of a direct buffer search, just traverse thru this structure.
Anyway, by now, let assume that BALZ is a fast LZ77 encoder. New BALZ v1.05 with "e" option is fast enough indeed.![]()
As long as its still '*MUCH* faster' than previous versions!![]()
The SS parsing's performance can't go out of my head. For example, BALZ with 1 MB window and SS parsing may beat BALZ with 4 MB window and lazy matching with 2-byte lookahead. Having said that with SS parsing encoder is dead slow (starting at 18X slower compared to my special lazy matching). Well, at least I can see how much "air" kept by current scheme. Note that in some cases and on some files the large dictionary make sense, even SS-based encoder with smaller dictionary may not compete with larger-dictionary brother with much simpler parsing strategy. Anyway, SS is still far from optimal, like I said in some cases like 'canterbury.tar' lazy matching provides significantly higher compression compared to SS. I tested LZMA with optimal and simple parsing schemes and I see how 'real' optimal parsing may help, with same settings (dict. size, match finder, and, the most important, simple parsing strategy) LZMA and BALZ are close together, of course, as they both utilize LZ77. Concluding, I will release what I currently have, and when we will see... something... Anyway, BALZ v1.05 is something special, believe me...
![]()