Page 1 of 2 12 LastLast
Results 1 to 30 of 38

Thread: BALZ v1.06 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Exclamation BALZ v1.06 is here!

    OK, on Victory Day, let me introduce the best compressor I have ever made. With this new BALZ I combined all my knowledge to made fast and efficient compressor. Just check it out!

    http://encode.ru/balz/index.htm


  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Ilia!

    Mirror: Download

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    Test Files: MC SFC

    BALZ [e]

    A10.jpg > 836,451
    AcroRd32.exe > 1,460,086
    english.dic > 861,622
    FlashMX.pdf > 3,762,896
    FP.LOG > 672,319
    MSO97.DLL > 1,885,369
    ohs.doc > 830,267
    rafale.bmp > 1,048,825
    vcfiu.hlp > 689,667
    world95.txt > 608,341

    Total = 12,655,843 bytes


    BALZ [ex]

    A10.jpg > 836,451
    AcroRd32.exe > 1,452,757
    english.dic > 797,618
    FlashMX.pdf > 3,758,128
    FP.LOG > 581,503
    MSO97.DLL > 1,879,986
    ohs.doc > 824,992
    rafale.bmp > 1,019,511
    vcfiu.hlp > 668,959
    world95.txt > 588,466

    Total = 12,408,371 bytes


    Test File: ENWIK8

    Compression

    BALZ [e]

    Compressed Size: 28,674,640 bytes

    Elapsed Time: 430.035 Seconds

    0000 Days 00 Hours 07 Minutes 10.035 Seconds



    BALZ [ex]

    Compressed Size: 28,234,913 bytes

    Elapsed Time: 681.141 Seconds

    0000 Days 00 Hours 11 Minutes 21.141 Seconds



    Decompression

    BALZ [e]

    Elapsed Time: 21.425 Seconds


    BALZ [ex]

    Elapsed Time: 21.183 Seconds

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Smile

    Thanks LovePimple!

    Results on ENWIK9:

    BALZ v1.06, e: 249,378,397 bytes
    BALZ v1.06, ex: 245,288,229 bytes


  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    LTCB results http://cs.fit.edu/~mmahoney/compression/text.html#2453

    I've been busy with other work so I am falling behind on data compression stuff. BTW, I like the new forum.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Talking

    Thank you!

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Cool

    That's not all folks! I made a new improvement - enlarge the ROLZ model form 64 MB to 128 MB! New BALZ v1.07 has even higher compression - especially on large and text files. Although it's slower. However, I increased the gap between "e" and "ex" modes - now "e" is more faster. (Just according to Matt's results the gap between these too modes is too small - compression times are too close)
    OK, some testing results:

    fp.log: 573,485 bytes
    english.dic: 784,590 bytes
    world95.txt: 576,362 bytes
    rafale.bmp: 1,009,371 bytes

    ENWIK8: 27,423,485 bytes
    ENWIK9: 237,606,519 bytes


  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Question

    Also, I have an idea to make the "e" mode real *FAST* - disabling any optimizations. What do you think?

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Lightbulb

    For example, BALZ v1.07 with various parsing schemes:

    0-Greedy Parsing
    1-Advanced Lazy Matching with 1-byte lookahead
    2-Advanced Lazy Matching with 2-byte lookahead
    3-Advanced Flexible Parsing

    world95.txt:
    0: 639,369 bytes
    1: 605,878 bytes
    2: 595,229 bytes
    3: 576,362 bytes

    fp.log:
    0: 702,537 bytes
    1: 675,286 bytes
    2: 671,477 bytes
    3: 573,485 bytes

    acrord32.exe:
    0: 1,481,176 bytes
    1: 1,465,609 bytes
    2: 1,460,975 bytes
    3: 1,453,415 bytes

    reaktor.exe:
    0: 2,099,718 bytes
    1: 2,070,973 bytes
    2: 2,069,256 bytes
    3: 2,030,253 bytes

    Like you see, the mostly parsing is important with text files. Maybe the greedy parsing keeps too much air in files. So, for Default mode reasonable to keep or Greedy or Advanced Lazy Matching with 1-byte lookahead. Your opinion?

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Some results with timings:

    3200.txt (16,013,962 bytes):
    0: 5,274,712 bytes, 7 sec.
    1: 5,042,148 bytes, 13 sec.
    2: 4,988,928 bytes, 18 sec.
    3: 4,889,024 bytes, 27 sec.


  11. #11
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Like you see, the mostly parsing is important with text files.


    I don't know about the speed differences, but 1 looks good for a fast default mode.

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Christian View Post


    I don't know about the speed differences, but 1 looks good for a fast default mode.
    Look at my post above:
    Quote Originally Posted by encode
    Some results with timings:

    3200.txt (16,013,962 bytes):
    0: 5,274,712 bytes, 7 sec.
    1: 5,042,148 bytes, 13 sec.
    2: 4,988,928 bytes, 18 sec.
    3: 4,889,024 bytes, 27 sec.

  13. #13
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Option 1 still looks good, although the speed hit over 0 is big.
    Last edited by Christian; 10th May 2008 at 17:25.

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    E2160 @ 9x360=3.24, DDR2-800 5-5-5-18 @ 900
    http://shelwien.googlepages.com/balz.htm
    1.06 looks much better, but I'd still prefer rar.

    Also I was thinking about it while testing... and now
    lost the last reasons to use LZ at all.
    Btw, that "last reason" was program distribution and the like,
    on which I based my test metric (compression time + time of
    downloading at 512kbps and decompression x 10).
    But now, if I think about it, I won't feel much inconvenience
    if rar's decoding speed became 10x slower - guess, that's
    because rar's algorithm was almost the same when I was
    using it on 386s.

    Then, anyway, I don't really care that much about program
    installation time, and I mostly use archivers... well, for
    archiving . And that's where low speed and data dependence
    of LZ optimal parsing is completely bad... I mean, why should I
    asymmetrically compress my DVD images when I'd never need
    to unpack most of them... while there're symmetrical methods
    with better and faster compression?
    Last edited by Shelwien; 10th May 2008 at 18:17.

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Talking

    I still prefer LZ-based algorithms. Things like new BALZ may be considered as a PPM approximation and at some point is close to PPM* or PPMDet, since it has many context based tricks - all LZ output encoding is done via quite complex order-1 context models, offsets selection based on context, etc.
    Anyway, the compressor selection is up to each user...

    Another results with timings:

    ENWIK8:
    0: 29,693,191 bytes, 55 sec.
    1: 28,272,252 bytes, 86 sec.
    2: 27,934,454 bytes, 108 sec.
    3: 27,423,485 bytes, 180 sec.

    Still the "1" should be the most efficient in terms of compression ratio vs. compression time. Like you see the compression ratio with this mode is notable better than with "0" (Greedy, Unoptimized parsing);


  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Forget about users, even 7-zip still doesn't have that much of attention.
    But what use do _you_ have for LZ compressors?
    Guess you could try replacing zlib in game data compression and the like...
    But then again, its totally dumb to compress textures and music with LZ.

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Talking

    BTW, my LZ modifications are quite efficient on textures - even QUAD performs better than WinRAR's PPMD:
    http://quad.sourceforge.net/
    Of course, new BALZ performs even better...

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Of course ppmd isn't any better in that area.
    But you could compare eg. with bmf.
    And what I wanted to say is that 2D LZ algorithms are absolutely impractical.
    Just look at motion prediction...

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Wink

    Of course specialized models are supreme...
    But if we talking about "General Purpose" term...

  20. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. The point is that specialized LZ-like models are ineffective, though not impossible.
    2. "Purpose" is what I'm getting at. What do you want to have ideally?
    ppmd-level compression with memcpy-like decoding speed?

    Btw, how do you know which LZ compressor is better?
    As to me, I don't know actually.
    Well, its obvious that the one faster both in encoding and decoding
    and having better compression is superior, but that's rare.

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Exclamation

    Ideally, the compression should be fast enough - optionally we should able to choose compression mode - favor compression speed or ratio. Memory usage should not be too high - lower=better, less than 200 MB is preferred. The decompression speed should be as fast as possible, in practice, it should be faster or equal to plain order-0 arithmetic encoder a-la FPAQ0.

    BTW, I have an idea about to remove LZ-layer from BALZ and add a SR (Symbol Ranking) or PPMDet variation ? to see what we will have. The results will be posted tomorrow ? just interesting to see how LZ can fight with SR-like stuff. At least with SR we do not need in any parsing, compression might be faster, but decompression slower?

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Cool. Let's hope that'd become a first step away from LZ

    Decompression would be faster if you'd add support for rank0 runs.
    And SR in general is more complex than CM because you're supposed to
    extrapolate the symbol ranking (which is ideally done by the same model as CM),
    and then you need another model to encode the result.
    Also, as I said, CM can benefit from optimal parsing (in fact, it provides much wider choice with full updates, partial updates, update exclusion etc).

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Btw, just remembered something.
    In your LZ, are you masking out the symbol following the match?

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien View Post
    Btw, just remembered something.
    In your LZ, are you masking out the symbol following the match?
    In LZPX/LZPXJ - Yes, I do! In other compressors - No, I don't!

    Well, my idea with low order SR completely failed... The results are far from LZ brother. At least tested the idea!

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Why don't, then? Is it possible to encode the match shorter than the real match with the given offset?

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien View Post
    Why don't, then? Is it possible to encode the match shorter than the real match with the given offset?
    Yes. Optimizer tries to build a best match/literal sequence. At each step we have a number of match lengths (up to len. found) plus best index for each match length - if we found a better combination, we may even keep a shorter match at current index.

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Still, I doubt that there's any sense encoding the incomplete match and then unmasked literal, right?
    Also there're other kinds of masking... like ofs2!=ofs1+len1, in the sequence <ofs1;len1><ofs2;len2>.

    Edit: Also wonder how frequent are these incomplete matches with your optimizer...
    You know, its possible to do partial masking too... like significantly decreasing the symbol probability, while still not making it zero.
    Last edited by Shelwien; 11th May 2008 at 22:02.

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien View Post
    Still, I doubt that there's any sense encoding the incomplete match and then unmasked literal, right?
    Also there're other kinds of masking... like ofs2!=ofs1+len1, in the sequence
    <ofs1;len1><ofs2;len2>.
    How we can mask a symbol with a bit-wise model? Can you provide more detailed explanation - I'm not quite understand your last words.

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. Do you prefer your bitwise model to smaller redundancy?
    Though of course its possible... at least you can cut off a bit
    from most probable symbol or something.
    Or just properly recalculate the probabilities

    2. Which last words? I mean that there're some obvious rules...
    Like that 2 matches for the same data are never better than a single match.

    Edit: But actually, guess you just have to add the suffix of previous match to SSE models for these bits.
    Also I think that you should use unary encoding at least for some symbols... which calls for masking too.
    But it would both make the processing faster (both encoding and decoding) and could improve compression, especially with SSE.
    Last edited by Shelwien; 11th May 2008 at 22:22.

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Shelwien View Post
    2. Which last words? I mean that there're some obvious rules...
    Like that 2 matches for the same data are never better than a single match.
    Quote Originally Posted by Shelwien View Post
    Also there're other kinds of masking... like ofs2!=ofs1+len1, in the sequence <ofs1;len1><ofs2;len2>.
    Agrh, Understand now. Well to squeeze the last bit from the output is not about LZ coders. Adding some tricks like exclusions may done the decoder too expensive - at the just tiny compression benefit. Don't forget about asymmetry!
    BALZ's bit model is very special - I tried many models, including PAQ1-styled, FPAQ0, FPAQ0P styled, but stayed with a special set of FPAQ0P models with a Mixer - a la FPAQ0M - since it turns out the best overall...

Page 1 of 2 12 LastLast

Similar Threads

  1. BALZ v1.12 is here!
    By encode in forum Data Compression
    Replies: 23
    Last Post: 10th June 2008, 16:02
  2. BALZ v1.11 is here!
    By encode in forum Data Compression
    Replies: 16
    Last Post: 30th May 2008, 16:48
  3. BALZ v1.05 is here!
    By encode in forum Data Compression
    Replies: 6
    Last Post: 8th May 2008, 23:34
  4. balz v1.04 is here!
    By encode in forum Forum Archive
    Replies: 28
    Last Post: 1st May 2008, 22:41
  5. balz v1.03 is here!
    By encode in forum Forum Archive
    Replies: 43
    Last Post: 24th April 2008, 14:53

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •