Page 4 of 6 FirstFirst ... 23456 LastLast
Results 91 to 120 of 179

Thread: RZM - a dull ROLZ compression engine

  1. #91
    Member
    Join Date
    Jun 2009
    Location
    Cracov, Poland
    Posts
    711
    Christian
    original s & s does cost optimization backward and path construction forward. igor pavlov in 7-zip uses the variation you described.

  2. #92
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Of course. I was referring to the "bit optimal" description in the linked post I posted several pages ago.

    Quote Originally Posted by donkey7
    igor pavlov in 7-zip uses the variation you described.
    Im quite sure, that LZX and Malcoms ROLZ does it forward, too. Otherwise it doesnt make much sense (e.g. for extended syntax, ...).

  3. #93
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    Quote Originally Posted by Christian
    Thanks to the great feedback, I seriously consider adding filters. But still, theyll only improve (sometimes hurt) compression on pure RGB, WAVE or similar data structures.
    Would it be possible to make a multithreaded brute force approach.
    one thread would encode without filter the other thread with filter. Then compare and use the smallest output.

    thats kind of the idea Im using my compression batch for. Going through all the different ways in parallel utilizing all my cores and finding the smallest output.

  4. #94
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    Just some small test i did

    Directory of the installed Office 2003 Danish version, stored inside a 7-zip container (store method)


    Org - 133.600.881 bytes
    7zip - 42.801.176 bytes)
    RZM - 40.182.244 bytes
    CCMx - 38.672.064 bytes

  5. #95
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    Hmm seems to be a 2gb filesize limit with RZM

  6. #96
    Guest
    Thanks Cristian for another good compressor.

    4Gb - CSS Game
    Core2 T5500, 2 Gb Ram

    Ratio ///////// Comp. ////// Decomp. /// Archiver
    34.417% //// 341kb/s //// 2097kb/s // WinRK 3.0.3 Rolz3 Normal
    34.427% //// 892kb/s //// 9262kb/s // WinArc 0.50a -mx -ld=1gb -mc-rep
    35.043% // 1425kb/s // 18095kb/s // 7-zip 4.56 Ultra -d110m
    36.209% //// 674kb/s //// 6364kb/s // RLZ v0.6c
    37.541% //// 919kb/s //// 2115kb/s // WinRK 3.0.3 Rolz3 Fastest
    41.060% // 1966kb/s // 15327kb/s // WinRar v3.71 Best

  7. #97
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    is it just me that cannot get RZM to work with big files (2gb?) ?
    The files are corrupted after decompression.

    The same seems to go with precomp.

  8. #98
    Guest
    I just store-split-7z my testset for RLZ.

  9. #99
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    Christian, why don't suupport the usual dolls - 4gb+ files, stdin/out, linux? rzm seems to be really useful and these features will help people

  10. #100
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    #bulat.
    You migt want to look into rep. it doesn't seem to like 4GB+ files

  11. #101
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    Quote Originally Posted by SvenBent
    You migt want to look into rep.
    thank you for report. i will probably update all my programs to use single cmdline/io core that supports linux, large files, stdin/out and gzip-compatible cmdline

  12. #102
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    # Bulat
    Delta 1.0 seem to work fines with big files


    original: 5.946.890.240 bytes
    delta: 5.947.099.436 bytes

    so the size seems right but I haven testet for propper decoding yet

  13. #103
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    Quote Originally Posted by SvenBent
    Delta 1.0 seem to work fines with big files
    it should: its driver is newer, borrowed from tor0.3. i suspect that only "incompatibility" of rep with large files is improper printing of their sizes and processing speeds

  14. #104
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    #0
    my test with Rep and 5.53gb files was a result in a 0byte file when it was decompressed.

    i will try again right now

  15. #105
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    SvenBent, thanks. anyway, i will publish rep with new driver. i got an idea of publishing universal driver that any compression algorithm author can use for his own creature. such driver can impement all the features i mentioned in lzturbo thread allowing developers to focus on core compression. almost all current standalone compressors are poor in their drivers and such effort should help developers to produce really useful utilities without switching to tiresome coding. this driver can also provide MT support to cutoff lzturbo base

    actually, freearc was intended for this purpose but now it was grown to much larger project.. now tornado already contains rather sophisticated driver that may be used as a base for such effort

  16. #106
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    393
    Couldn?t you create a new thread to discuss things like this which has nothing to do with rzm anymore?

  17. #107
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    just tested two files.
    rep seem ok . Must be a fault in my prior test


    test1 (same as the first test)

    Org: 5.946.890.240 bytes
    REP: 5.909.111.554 bytes
    CRC matches


    test2:
    Org: 4.300.154.880 bytes
    REP: 4.257.603.507 bytes
    CRC untestet

    I withdraw my claim regarding REP and lack of 4gb+ file support

  18. #108
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Hi everyone!

    Im back from my short trip to Sardinia - it was pretty nice, but quite cold. Sorry for the lack of answers. Ill try to catch-up.

    Quote Originally Posted by SvenBent
    Would it be possible to make a multithreaded brute force approach.
    one thread would encode without filter the other thread with filter. Then compare and use the smallest output.
    Of course, but I prefer to do a good detection. Otherwise nearly all string-searching data structures have to be doubled - which is ugly.

    Quote Originally Posted by SvenBent
    Hmm seems to be a 2gb filesize limit with RZM
    Yes. I want to improve the naked compression core before adding all this unnecessary stuff. But stdin/stdout and big files will be added, of course.

    Quote Originally Posted by Zonder
    Thanks Cristian for another good compressor.
    Thanks!

    There is a new version of RZM, too. It works quite well, but it is not very polished, leaving room for further improvements (speed and ratio). I did some major changes to the core and extended the syntax. It handles strange files like "valley_cmb, proteins, ..." in a better way now.

    There are still some deficiencies in the syntax which I want to address - e.g. long distance matches are still missing. Good news is, that I figured several ways howto merge them with ROLZ. Bad news is, that LZ77 is better suited for LDMs. Filters will be added at a later point.

    [removed]

    Have fun with this new version.

  19. #109
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    Quote Originally Posted by Christian
    I want to improve the naked compression core before adding all this unnecessary stuff
    i dont understand - what is a problem with large files for stream compressor? also, what you think about using standard driver i plan to provide? your code will just need to call read/write callbacks and process compression setting options, everything else will be handled there

    thats more interesting, i have an idea for making compression multithreaded without losing ratio. am i correctly understand that rzm indexing is much faster than string searching?

    Quote Originally Posted by Christian
    long distance matches are still missing. Good news is, that I figured several ways howto merge them with ROLZ
    why not just integrate rep-like engine? it should allow to find 16+byte matches with a small memory footprint

  20. #110
    MOC Test ->150.036.048 comp. 285,394 sec. dec. 47,121 s.
    Enwik8->24.342.076

  21. #111
    @Christian
    Did I want to know why you have not added the filter delta for the images and the audio?

  22. #112
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Quote Originally Posted by Bulat Ziganshin
    i dont understand - what is a problem with large files for stream compressor?
    Its just some overflows in the match-finder, ... But Im changing all this stuff every now and then while altering the syntax. So, I do not want to do things twice.

    Quote Originally Posted by Bulat Ziganshin
    also, what you think about using standard driver i plan to provide? your code will just need to call read/write callbacks and process compression setting options, everything else will be handled there
    Honestly, Im not so fond about this. I think its a great idea - but it depends on how much existing code has to be changed in order to make it work. Additionally, Ill most probably add precomp support - so, I dont know if the framework will fit.

    Quote Originally Posted by Bulat Ziganshin
    i have an idea for making compression multithreaded without losing ratio. am i correctly understand that rzm indexing is much faster than string searching
    Yep. String-searching eats most time - maybe 60-90% (heavily depending on the data).

    Quote Originally Posted by Bulat Ziganshin
    why not just integrate rep-like engine? it should allow to find 16+byte matches with a small memory footprint
    I dont know, maybe. Btw., funny story: I wrote such a tool for a friend once. He wrote a BWT based compressor whose string sorting stage had some serious worst case behaviour - the tool was a workaround for this.


    Quote Originally Posted by Nania Francesco Antonio
    Did I want to know why you have not added the filter delta for the images and the audio?
    Because Ive been on a short vacation. But still, I do content based data detection - this needs more tuning and time. Please try the new version - maybe its better on the MOC testset.

    RZM 0.07c

  23. #113
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks Chris!

    Mirror: Download

  24. #114
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Thanks for the mirror, LovePimple!

  25. #115
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,722
    Quote Originally Posted by Christian
    Its just some overflows in the match-finder, ... But Im changing all this stuff every now and then while altering the syntax. So, I do not want to do things twice.
    but this prohibits using of rzm for real compression

    Quote Originally Posted by Christian
    what you think about using standard driver
    even more - why you dont just use your own deiver from CCM? i dont undestand why this driver should be program-specific - i personally just copy the same code from project to project

    Quote Originally Posted by Christian
    String-searching eats most time - maybe 60-90% (heavily depending on the data).
    i thought that it eats even more time: it should be very easy to add string to the match finder indexes. the idea is obvious: imagine that you have N cores. split data to the N chunks and run two processes in parallel - first process compress first chunk of data as usual while second just index them into separate hash table. when second process finished, start two new threads - one compress second chunk of data while another makes copy of hash table and continue to index them, and so on

    moreover, these processes may try to share indexing structures. for rolz-1, the main table (which stores 64k entries for each context byte) probably may be shared


    also, you said about problems with too long distances. cant this be solved by using "segmented" table, i.e. instead of saving exactly 64k entries for each byte you may have, say, 16k segments 1024 entries each and realloc them between chars dynamically, depending on current usage stats

  26. #116
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Quick test...

    A10.jpg > 836,117
    AcroRd32.exe > 1,236,155
    english.dic > 608,859
    FlashMX.pdf > 3,678,457
    FP.LOG > 505,744
    MSO97.DLL > 1,646,797
    ohs.doc > 784,108
    rafale.bmp > 920,106
    vcfiu.hlp > 579,879
    world95.txt > 525,901

    Total = 11,322,123 bytes


    ENWIK8 > 24,334,580 bytes

  27. #117
    MOC Test ->149.669.630 comp. 302,440 sec. dec. 53,634 s.

  28. #118
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Thanks for all the suggestions, Bulat. Other thoughts are always good.

    Quote Originally Posted by Bulat Ziganshin
    but this prohibits using of rzm for real compression
    I know. Still, Im doing this for fun. So, I do the fun things first - the actual algorithm. But Ill add the other stuff later.

    Quote Originally Posted by Bulat Ziganshin
    why you dont just use your own deiver from CCM?
    Actually, I do. But as always, you find things which can be improved. And CCM was my first compressor - the whole filtering was half assed - excuse the wording. This time I planned filtering from the beginning - I just didnt do it yet.

    Quote Originally Posted by Bulat Ziganshin
    i thought that it eats even more time...
    It really depends on the data. On already compressed data string-searching is pretty fast. Your idea seems to be good, but there is at least one better approach for my ROLZ. I use 16M binary trees. You can just distribute the trees by their context over several threads. This way, thered be only some syncing with the parser. Since the threads work on different trees, you even dont need additional data structures (maybe ~64k for match-results). Only a good distribution has to be selected for each block - assuming block based optimal parsing. But this could be done by a fast data analysis. Still, I dont plan adding threading anytime soon.

    Quote Originally Posted by Bulat Ziganshin
    also, you said about problems with too long distances. cant this be solved by using "segmented" table, i.e. instead of saving exactly 64k entries for each byte you may have, say, 16k segments 1024 entries each and realloc them between chars dynamically, depending on current usage stats
    Actually, the problem is most prominent on already compressed data (2x the same file) because each context gets discarded nearly equally fast. In this case segmentation does not help. In other cases it might help. But it would double the memory requirements for the binary trees.
    Anyway, I already figured several workarounds for this, but I have to try them out.

  29. #119
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    353
    Quote Originally Posted by Christian
    I know. Still, Im doing this for fun. So, I do the fun things first - the actual algorithm.
    Now thats just selfish

  30. #120
    Programmer
    Join Date
    Feb 2007
    Location
    Aachen, Germany
    Posts
    397
    Quote Originally Posted by SvenBent
    Now thats just selfish
    Hehe... You know, you have to set priorities.

Page 4 of 6 FirstFirst ... 23456 LastLast

Similar Threads

  1. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 57
    Last Post: 3rd August 2008, 02:44
  2. RZM doesn't like to share cpu
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 19th July 2008, 23:35
  3. ROLZ explanation?
    By Trixter in forum Data Compression
    Replies: 5
    Last Post: 10th June 2008, 19:24
  4. RZM compressor
    By encode in forum Data Compression
    Replies: 2
    Last Post: 24th May 2008, 15:59
  5. A small article on ROLZ (Russian)
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 29th April 2007, 16:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts