Page 1 of 6 123 ... LastLast
Results 1 to 30 of 179

Thread: RZM - a dull ROLZ compression engine

  1. #1
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Hi everyone!

    I'm pleased to release another compression utility - a ROLZ based compression engine. RZM is very alpha and is more or less a proof of concept. So let the bug reports come in. Compression is slow, I'm sorry - but decompression is pretty fast. Therefore, it's highly asymmetric. RZM is the result of my experiments with near-optimal parsing and will most likely not be developed any further.
    Its scheme reduces offsets with the previous byte - 64k per context. The symbol coder is only o1. The sliding window is 64MB - better safe than sorry. Generally speaking, ROLZ has one major drawback: long-distance-matches are nearly impossible because offsets get discared too fast (even with 64k offsets per context). Btw, RZM has got a simple exe-filter. So, it's more or less pure compression 'power'. Before this one gets lost on my hard disk, I thought I share it with you.

    Download RZM 0.02

    @Schnaader: I'm sorry that there hasn't been any mail. But I'm very, very busy right now. I'll contact you regarding precomp if some free time comes up again.

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Chris!

    Mirror: Download

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    A10.jpg > 836,229
    AcroRd32.exe > 1,276,453
    english.dic > 676,478
    FlashMX.pdf > 3,694,219
    FP.LOG > 537,828
    MSO97.DLL > 1,714,316
    ohs.doc > 794,281
    rafale.bmp > 949,920
    vcfiu.hlp > 600,403
    world95.txt > 544,074

    Total = 11,624,201 bytes

    Nice!

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Another quick test...

    Test machine: Intel PIII (Coppermine) @750 MHz, 512 MB RAM, Windows 2000 Pro SP4

    Test File: ENWIK8 (100,000,000 bytes)

    Timed with AcuTimer v1.2


    ENWIK8 > 24,930,240 bytes
    Elapsed Time: 00:14:50.858 (890.858 Seconds)

  5. #5
    Member
    Join Date
    Jan 2008
    Posts
    33
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello Christian

    RZM is really amazing!!

  6. #6
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very Nice! RZM is fantastic!

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Damn, Chris, you again done something impossible - crazy work!

    Can you describe how do you encode LZ output in detail.

    Do you use flags or single alphabet for match lengths/literals?
    How do you encode offsets?


  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    And the main question is - how do you implemented an optimal parsing with ROLZ - as far as I understand you update offsets per each byte, not per phrase.

  9. #9
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    @all:
    Thanks! Heres an updated version. I improved the parsing a tiny, tiny bit. It should be compatible with the old version - not that it matters.

    RZM 0.03

    Quote Originally Posted by encode
    Do you use flags or single alphabet for match lengths/literals?
    I use a flag for literal/match decision. Combining lengths&literals in one alphabet is a poor choice, IMO. When using a flag, you can use different contexts for the flag, length and literals.

    Quote Originally Posted by encode
    Can you describe how do you encode LZ output in detail.
    The flag, lengths and distances are using the previous byte as context. The symbol coder uses the previous byte and a small state as context.
    Im quite sure, that the contexts can be improved further to improve compression (use some length-info in the offset context, ...), but I somehow lost interest.

    Quote Originally Posted by encode
    How do you encode offsets?
    I dont understand this question. I simply code the distance with some bucket-coder.
    <div class=""jscript""><pre>distance(prev,x)=(root[prev]-x-1)&(KEY_MAX-1)</pre>[/QUOTE]

    AFAIR, I already posted something about bucket/bin-coding. In a nutshell, it is like gamma coding with improvements for small distances (extra buckets).

    [QUOTE]<div class=""quoting"">Quoting: encode</div>as far as I understand you update offsets per each byte, not per phrase.</div>

    Yes, you do. But why is this a problem? For a given index i, the offset table offsets[win[i-1]][...] is always the same, no matter how you reach offset i.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Thank you!
    Quote Originally Posted by Christian
    The symbol coder uses the previous byte and a small state as context.
    Its interesting. Small state what is? In other words how do you get an additional context?

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Quote Originally Posted by Christian
    But why is this a problem? For a given index i, the offset table offsets[win[i-1]][...] is always the same, no matter how you reach offset i.
    As you know all my ROLZ implementations add an offset per phrase. At least in this case decoder is faster.

  12. #12
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Quote Originally Posted by Christian
    use some length-info in the offset context
    I just remembered why I didnt do this. It improves compression, Im quite sure, but then, the offset cost gets variable during cost optimization - which slows down the parsing a bit. I think it wasnt worth it.

    Quote Originally Posted by encode
    As you know all my ROLZ implementations add an offset per phrase.
    I see. This complicates parsing a lot.

    Quote Originally Posted by encode
    Its interesting. Small state what is?
    This is some magical number, which alters based on the coding decision and the context i. Thus, it can be easily propagated during parsing.
    <div class=""jscript""><pre>state=get_state(st,i,decisi on)[/code]

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Quote Originally Posted by Christian
    This is some magical number, which alters based on the coding decision and the context i. Thus, it can be easily propagated during parsing.
    Am I right?
    i - whole previous byte (order-1 context)
    decision - 0/1 - i.e. literal or match
    st - ?? (previous state?)

    And how do you mix that?
    <div class=""jscript""><pre>
    (st<<9)+(decision<<+i
    [/code]

    Thanks in advance!

  14. #14
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    RZM 0.2
    ENWIK9 -> 216399070 Bytes

  15. #15
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    RZM vs 7ZIP
    7zip 4.42 [LZMA]
    ENWIK8 -> 24,996,113
    ENWIK9 -> 213,490,979
    RZM 0.02
    ENWIK8-> 24,930,240
    ENWIK9->216399070

  16. #16
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Last one for today. Decompression speed is improved a bit.
    RZM 0.04

    Quote Originally Posted by encode
    Am I right?
    Yes, you are - but i is the current file offset. Sorry, it shouldve looked like this.
    <div class=""jscript""><pre>new_state=get_state(state,i ,decision)[/code]

    Regarding the state update. You can remember in that tiny state, what youve done before. Or you can use (i & 3) as part of a new state. Or you can analyse win(i-n,i-1) which is known to the decoder, too. You see, there are many, many possibilities. It all comes down to how your coders are designed in detail.
    Finally, I almost forgot. Or you can just use stronger coders.

  17. #17
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Chris!

    Mirror: Download

  18. #18
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Looking awesome! This is another good job story after CCM. ROLZ is interesting thing for Ilia, Nania and at last me. But, AFAIK none of us didn't merge a good parsing method with ROLZ (at least me, of course ). So, can you give us more detail?

    - ROLZ match context? Order-1 or order-2?
    - What about the detail of ROLZ parsing?
    - Do you use LZMA like state decision?

    At last, here is the test results with my test file: valley.cmb (19,776,230 bytes)
    Code:
    7-Zip (LZMA-Ultra) -> 7,507,360 bytes 
    PAQ9a (-9) -> 7,528,983 bytes 
    WinRK (ROLZ3-Best) -> 7,552,372 bytes 
    RZM 0.04 -> 8,157,401 bytes 
    WinRAR (Best) -> 8,511,238 bytes 
    LZPM 0.15 (-9) -> 9,367,718 bytes 
    BIT 0.1 -> 9,741,439 bytes 
    ZIP (Max) -> 9,887,176 bytes 
    Quad (-x) -> 10,017,449 bytes
    BIT Archiver homepage: www.osmanturan.com

  19. #19
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Thanks Chris! It looks like you find just some spare minutes a day and each few months there's new compressor using classic algorithm with ground-breaking results

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    This guy is a genius!

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Quote Originally Posted by osmanturan
    my test file: valley.cmb (19,776,230 bytes)
    By the way, can you upload this file. Im curious about my new LZ77s performance on this file!

  22. #22
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ive posted the reason why Im using this file:
    Quote Originally Posted by osmanturan
    I believe most of people in this forum should test at least a game map for their compressor (espacially a BSP map). Why? Because, a well used bsp file consists several data types at the same time. For example, a regular Quake 3 bsp map file consists such things:

    - A huge string for holding map entity decleration
    - A huge string which is broken into fixed sized (64 characters) strings (holds material paths)
    - 32 bits integer for some indexing
    - 3D floating-point vectors
    - A huge bitstream for visibility for each map area (actually called as cluster)
    - A huge 3D grid which holds quantized 3D vectors for local dynamic lighting
    - A huge chunk which holds several 256x256 raw RGB images (lightmaps)
    Ive just uploaded the file at my site. The link: http://www.osmanturan.com/valley.7z

    Im curious about your test results, too!
    BIT Archiver homepage: www.osmanturan.com

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Quote Originally Posted by osmanturan
    The page cannot be found

  24. #24
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I can't figure out what's happening in the foolish server or internet explorer When I renamed it "test.zip" in FTP, there is no problem. You can download the file: http://www.osmanturan.com/test.zip
    Do not forget to change it's extension! It's actually a 7-zip archive (extension: 7z)
    BIT Archiver homepage: www.osmanturan.com

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,946
    Thanks
    343
    Thanked 326 Times in 129 Posts
    Got it!

    OK, my new LZ77 with basic configuration compresses valley.cmb down to:
    8,904,441 bytes


  26. #26
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Better than LZPM 0.15 (-9)! Cool
    BIT Archiver homepage: www.osmanturan.com

  27. #27
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 150 Times in 18 Posts
    Thanks again! I have not expected such positive feedback. I mean, RZM is slow and ratio is nothing new - Malcolms ROLZ seems to be even stronger. Additioally, RZM is missing serious data filters or a CM-backend.

    Quote Originally Posted by osmanturan
    - ROLZ match context? Order-1 or order-2?
    Order 1.

    Quote Originally Posted by osmanturan
    - Do you use LZMA like state decision?
    No. State update is based on the content of the sliding window at the current position, too. But you shouldnt overestimate the impact of an additional state context - at least for RZM.

    Quote Originally Posted by osmanturan
    - What about the detail of ROLZ parsing?
    Parsing is not much different from LZ77 parsing. I do forward parsing - it makes state propagation and string searching easier. I dont know what you want to know exactly?

  28. #28
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Christian
    I dont know what you want to know exactly?
    Can you share us a psuedo code for making it clear? I think, ilia wonders what is happenning in your parsing, too

    Quote Originally Posted by Christian
    Malcolms ROLZ seems to be even stronger
    My current implementation is:
    - Order-1 ROLZ match context
    - Match length based flexible parsing (same as Quads)
    - 32768 ROLZ offset table (Ive used Deflate distance coding schema with Order-0 model which exactly fits )
    - Segmented match length coding with a small state context. I have notice most of files match length avarage around 8, and some of them 16
    - Literals encoded with an approximation of neural network context mixer which mixes Order 0-1-2 model by updating weight in linear space (Actually not a neural network)
    - 16 MB non-sliding dictionary. Because I want to implement a multithreading task manager which manages of compressing each 16 MB block (currrently not implemented)

    Why Im giving these informations? Because, my experience about CM back-end doesnt really gain as expected for ROLZ. You have to know, my current unreleased implementation is not powerful as yours So, popular question becomes parsing. As we see, parsing is really a matter for ROLZ
    BIT Archiver homepage: www.osmanturan.com

  29. #29
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Christian
    But you shouldnt over estimate the impact of an additional state context - at least for RZM.
    So, this can be true for my implementation, too Because, my match length encoding based on the same thing as state: position & 3 This gives me about %1-2 gain on calgary corpus.
    BIT Archiver homepage: www.osmanturan.com

  30. #30
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by osmanturan
    - Literals encoded with an approximation of neural network context mixer which mixes Order 0-1-2 model by updating weight in linear space (Actually not a neural network)
    I wonder how you do that. If you ripped off Matts NN leaving stretch & squash you are doing nonsense. Could you post a formula for weight updating?

    BTW: Christian, which is your weighting approach? I have tried several different and found gradient descent / NN (gradient descent with a nonlinear transform) to be very robust (and efficient). The NN method requires no divisions, but needs more table lookups.
    Shelwiens method didnt work for me, at least not as good as both mentioned above.
    Newtons method for root search seems to be inappropriate, since the gradient of a weighted sum doesnt have a zero analytically (only at w->infinity), so it diverges.
    At the moment im trying some modification of the simplex algorithm (nelder&mead) to minimize coding cost.
    To increase robustness one might change the optimization metric, e.g. minimizing an averaged coding error.
    Somehow Im out of luck here...

    ps: gr??e von deutschland nach deutschland
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

Page 1 of 6 123 ... LastLast

Similar Threads

  1. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 17:47
  2. RZM doesn't like to share cpu
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 19th July 2008, 22:35
  3. ROLZ explanation?
    By Trixter in forum Data Compression
    Replies: 5
    Last Post: 10th June 2008, 18:24
  4. RZM compressor
    By encode in forum Data Compression
    Replies: 2
    Last Post: 24th May 2008, 14:59
  5. A small article on ROLZ (Russian)
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 29th April 2007, 15:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •