Results 1 to 19 of 19

Thread: SQUID - a fast LZP-based file compressor

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Cool SQUID - a fast LZP-based file compressor

    Merry Christmas and Happy Holidays!

    Let me introduce the initial release of my brand new LZP-based file compressor called SQUID! It was designed for modern CPUs (6+ MB of L3 cache) and with reasonable compression/decompression speed in mind. Honestly, it could be called LZPX2 - it is very similar to the original LZPX with some major improvements.
    At the beginning it was a testbed for my new fast order-1 coder (a part of unreleased PPMX version), but later I decided to release it as a simple LZP - since I liked the results.

    So, there is a three compression levels:
    c0 - Order-1 coder only (Disable string matching)
    c1 - LZP with Greedy Parsing
    c2 - LZP with Flexible Parsing
    Decoder is the same for all compression levels.

    Homepage:
    http://compressme.net/#downloads



    LTCB results on my machine (SQUID v0.01) (i7-4790 @ 4.1 GHz, 32 GB 1866 MHz, MSI RAMDisk):
    Code:
    Z:\>squid c enwik9
    Compressing enwik9:
    1000000000 -> 280681928 in 20.30s
    
    Z:\>squid c2 enwik9
    Compressing enwik9:
    1000000000 -> 275793169 in 29.05s
    
    Z:\>squid d enwik9.sqd e9
    Decompressing enwik9.sqd:
    275793169 -> 1000000000 in 17.30s
    
    Z:\>squid c enwik8
    Compressing enwik8:
    100000000 -> 32164585 in 2.34s
    
    Z:\>squid c2 enwik8
    Compressing enwik8:
    100000000 -> 31697171 in 3.08s
    
    Z:\>squid d enwik8.sqd e8
    Decompressing enwik8.sqd:
    31697171 -> 100000000 in 2.03s

  2. The Following 14 Users Say Thank You to encode For This Useful Post:

    Bilawal (26th December 2016),comp1 (25th December 2016),Cyan (25th December 2016),elit (1st April 2019),hexagone (25th December 2016),JamesB (27th December 2016),Mike (25th December 2016),Minimum (25th December 2016),Nania Francesco (25th December 2016),RamiroCruzo (25th December 2016),samsat1024 (28th January 2017),Simorq (27th January 2018),Stephan Busch (29th December 2016),zeon (25th December 2016)

  3. #2
    Member RamiroCruzo's Avatar
    Join Date
    Jul 2015
    Location
    India
    Posts
    15
    Thanks
    137
    Thanked 10 Times in 7 Posts
    Code:
    >squid c0 UI.sb UI.sb.c0
    Compressing UI.sb:
    86347072 -> 81343216 in 26.97s
    
    
    >squid d UI.sb.c0 UI.sb.c0.unp
    Decompressing UI.sb.c0:
    81343216 -> 86347072 in 19.86s
    
    
    >squid c1 UI.sb UI.sb.c1
    Compressing UI.sb:
    86347072 -> 25672949 in 7.53s
    
    
    >squid d UI.sb.c1 UI.sb.c1.unp
    Decompressing UI.sb.c1:
    25672949 -> 86347072 in 4.21s
    
    
    >squid c2 UI.sb UI.sb.c2
    Compressing UI.sb:
    86347072 -> 21851577 in 14.77s
    
    
    >squid d UI.sb.c2 UI.sb.c2.unp
    Decompressing UI.sb.c2:
    21851577 -> 86347072 in 6.92s

  4. The Following 2 Users Say Thank You to RamiroCruzo For This Useful Post:

    encode (25th December 2016),Simorq (27th January 2018)

  5. #3
    Member RamiroCruzo's Avatar
    Join Date
    Jul 2015
    Location
    India
    Posts
    15
    Thanks
    137
    Thanked 10 Times in 7 Posts
    Had a bit of free time, so tested it a bit more, I think decompression can be fastened by reading into buffer and same goes for writing compression output...

    Code:
    Compressing Abyss.bsm_tmp:
    114647933 -> 67046526 in 13.15s
    Process Time     : 13.109s
    Clock Time       : 13.187s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 109 MB (in 4 reads)
    IO Write         : 63 MB (in 16373 writes)
    Decompressing Abyss.bsm_tmp.c0:
    67046526 -> 114647933 in 15.46s
    Process Time     : 14.843s
    Clock Time       : 15.509s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 63 MB (in 16369 reads)
    IO Write         : 109 MB (in 7 writes)
    Compressing Abyss.bsm_tmp:
    114647933 -> 46416205 in 12.31s
    Process Time     : 12.312s
    Clock Time       : 12.337s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 109 MB (in 4 reads)
    IO Write         : 44 MB (in 11337 writes)
    Decompressing Abyss.bsm_tmp.c1:
    46416205 -> 114647933 in 9.80s
    Process Time     : 9.124s
    Clock Time       : 9.871s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 44 MB (in 11333 reads)
    IO Write         : 109 MB (in 7 writes)
    Compressing Abyss.bsm_tmp:
    114647933 -> 45796296 in 16.86s
    Process Time     : 16.842s
    Clock Time       : 16.891s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 109 MB (in 4 reads)
    IO Write         : 43 MB (in 11185 writes)
    Decompressing Abyss.bsm_tmp.c2:
    45796296 -> 114647933 in 10.41s
    Process Time     : 9.936s
    Clock Time       : 10.430s
    Working Set      : 71 MB
    Pagefile         : 69 MB
    IO Read          : 43 MB (in 11181 reads)
    IO Write         : 109 MB (in 7 writes)
    Press any key to continue . . .

  6. The Following User Says Thank You to RamiroCruzo For This Useful Post:

    Simorq (27th January 2018)

  7. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I have done many emperiments these weeks.

    Implemented all LZ variants - LZP/ROLZ/LZRW2/LZ77 within the same literal/match coder (including decoders).
    The LZP does its job, especially in terms of compression speed vs. compression ratio, at the cost of the decompression speed though. Mainly the decompression speed suffers because we code much smaller count of matches, compared to LZ77, as example. i.e. it's mainly not because of mandatory hash table - LZRW2 has to maintain the same hash table as well, but its decompression speed is notable faster compared to LZP - since it finds more matches - and with a match we quickly decode a few bytes at once. I'm talking about LZs with arithmetic backend here.

    Anyway, I'd like to keep SQUID as an LZP-based coder. I've made a few major improvements to it - improved literal coding (at last now I handle literal-after-match case) and improved parsing.
    Although, I have some problems with LZP's Optimal Parsing. First of all, I have found that adding strings to the dictionary at literal/match bounds is mandatory - since adding strings at each input byte will ruin both compression ratio and speed. This limits parsing to Flexible Parsing only, I presume. And now I have a few variants - the one that provides maximum count of matches (best on finn_lst/english.dic) and the one with more literals and longer matches (better for ENWIKs, but much worse on finn_lst/english.dic). I'm looking for a better solution.

    In addition, just as a classic PPM, LZP suffers from the same optimal model order problem. For some files an order-3 hash table is optimal, for another an order-4 or order-5. With the SQUID v0.01, I use an order-4 -> order-2 hash table - i.e. if order-4 context is not confirmed (hash collision) we use order-2. I have obtained nice results with order-8 -> order-4 -> order-2 hash tables, at the cost of compression and, sadly snd unacceptably, the decompression speed drop.
    For a single model order I have found that order-3 is good enough - it is fast for both compression and decompression and with my current parsing provides 286,xxx,xxx on ENWIK9 (speed is faster than current SQUID v0.01 c1)

    The research goes on...


  8. The Following 3 Users Say Thank You to encode For This Useful Post:

    Cyan (25th January 2017),Mike (24th January 2017),RamiroCruzo (25th January 2017)

  9. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    SQUID v0.02:
    • Now it's an order-5 LZP (memory usage is the same)
    • Improved literal/match length coding (same order-1 model) + heavily optimized Range Coder
    http://compressme.net/#downloads

    Quick ENWIK9 results on my machine:
    Code:
    Z:\>squid c enwik9
    Compressing enwik9:
    1000000000 -> 273086097 in 19.08s
    
    Z:\>squid c2 enwik9
    Compressing enwik9:
    1000000000 -> 268331777 in 27.54s
    
    Z:\>squid d enwik9.sqd e9
    Decompressing enwik9.sqd:
    268331777 -> 1000000000 in 15.73s

  10. The Following 2 Users Say Thank You to encode For This Useful Post:

    comp1 (17th January 2018),Mike (18th January 2018)

  11. #6
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    First test:
    Compressing 64 GBytes to 9.3 GBytes within 2 hours
    Compress
    more then 20% better as zip
    and needs less then half the time as zip.

    Very very good!
    Will it anymore faster with a bigger read buffer?

  12. #7
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by joerg View Post
    First test:
    Compressing 64 GBytes to 9.3 GBytes within 2 hours
    Compress
    more then 20% better as zip
    and needs less then half the time as zip.

    Very very good!
    Will it anymore faster with a bigger read buffer?
    Sounds like a strong contender to zstd... This is very interesting

  13. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by joerg View Post
    Will it anymore faster with a bigger read buffer?
    Nope. It could be even slower. As example, 32 MB buffer is notable faster than 64 MB on my SSD. But 32 MB is too small IMO - since LZP can act as a dedupe (like SREP) - it is able to find matches at any distance within a buffer (with a large enough hash table).
    As I already mentioned, I've implemented ALL LZ variants within the same literal/match encoder. (LZ77, LZRW, ROLZ, LZP) And I was unable to beat my LZP in terms of compression speed vs ratio! And like I said earlier, these all comes at the cost of the decompression speed (LZ77 at the very least 2X faster at the decompression)


  14. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    if you had implemented al variants, why not include them all? it's interesting to compare, as well as may be useful for different tyopes of data. And of course OSS with liberal license will be even better - it may serve as learning bench for LZ* family.

    imho, BCM already in this pack - it's small, comprehensible example of good compression with BWT+CM

  15. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    encode (19th January 2018)

  16. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The lack of spare time + I'm too picky regarding my software.
    I confess, I had an idea to add ROLZ to SQUID - making it a hybrid - LZP for fast compression and ROLZ for a higher/slower compression modes.

    As to OSS:
    LZ4X / CRUSH = LZ77
    QUAD / BALZ = ROLZ
    LZPX / LZPXJ = LZP

    Yep, these a bit old, but the main experiment with SQUID is a PPMd-type order-1 coder (byte-oriented, with frequency sorting and other tricks). The bit-oriented coder a la the one used in the BALZ is better in terms of compression, but it is also notable slower...

  17. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i think that my name was recognized a bit higher once i published Tornado - a large package covering wide range of compression techniques, rather that multiple independent small projects

    Indeed, you have developed almost every type of algo possible. But may be combining them into single pack would drive "sales"? One thing about your algos is that they all tend to be very small, thus been an ideal teaching basis.

  18. #12
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    How about coding match flag character + length like zmolly?
    Maybe better result if mix PPMX order4,2,1,0,-1 + its style.

  19. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    SQUID v0.03:
    • It's an order-4 LZP - same 64 MB buffer and 20-bit hash
    • Replaced byte-oriented encoder with bit-oriented one (slower, but higher compression)
    • Added Optimal Parsing - which actually measures the literal and match price - this really helps on some weird files like english.dic and finn_lst
    • Added repeated offsets. It is unusual for LZP. To the best of my knowledge, for the first time it was used in UHARC LZP by Uwe Herklotz

    Enjoy new release!

    http://compressme.net/#downloads

    Code:
    Z:\>squid c enwik9
    Compressing enwik9:
    1000000000 -> 274672154 in 26.9 sec
    
    Z:\>squid cx enwik9
    Compressing enwik9:
    1000000000 -> 269347430 in 48.8 sec
    
    Z:\>squid d enwik9.sqd e9
    Decompressing enwik9.sqd:
    269347430 -> 1000000000 in 22.6 sec


    And actually, I'm preparing myself to the LZPM resurrection (ROLZ with Optimal Parsing)

  20. The Following 7 Users Say Thank You to encode For This Useful Post:

    Bulat Ziganshin (3rd October 2018),comp1 (3rd October 2018),Cyan (3rd October 2018),JamesB (9th October 2018),Mike (4th October 2018),Nania Francesco (4th October 2018),Stephan Busch (3rd October 2018)

  21. #14
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Ilia I wanted to tell you that I did a test with the file's " MSO97.DLL " ( option c ) "vcfiu.hlp" (option cx) , and decompression is not the same!

  22. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I retested the SQUID on both files with all compression modes - the decompression is bit-to-bit identical! Can anyone else confirm any issues?

  23. #16
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    216
    Thanks
    97
    Thanked 128 Times in 92 Posts
    In my "c" and "cx" tests, all decompressed Maximum Compression and Silesia files are identical to the original.

  24. The Following User Says Thank You to Mauro Vezzosi For This Useful Post:

    encode (4th October 2018)

  25. #17
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    14
    Thanks
    6
    Thanked 3 Times in 2 Posts
    SQUID's decompression speed is 2 times slower than that of lzma. If you want a decent decompression speed for practical purposes, you simply can't do certain things in the decoder. For example, you cannot hash the context for each position and update the ROLZ table, like in the older zlite compressor by RichSelian - that's already slower than lzma.

    Christian Martelock got it about right with RAZOR. It's a bit too slow on compression, but otherwise it beats lzma.

  26. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Please read the entire thread, and especially this!

  27. #19
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    14
    Thanks
    6
    Thanked 3 Times in 2 Posts
    I tried my own ROLZ encoder, and it works! Making it work was hard thought, not experienced in this kind of stuff...

    So I cannot confirm that updating the ROLZ table only at literal/match boundary improves compression. At least with the naive greedy parsing that I'm using, updating the table for each byte slightly improves compression. But it's much slower at decompression. The next best thing, I thought, was to put just one position in the middle of the match. So I made a few experiments, and it seems that putting at the second to last byte of each match works best. It's cheap (can be executed out of order) and it reclaims more than half of the difference. Here's the actual decoder:


    uint16_t *ro16 = ro16buf;
    for (size_t i = 0; i < nsym; i++) {
    uint16_t sym = sym16buf[i];
    if (sym < 256) {
    block[pos] = sym;
    rolz1dec_putpos(&dec, block, pos);
    pos++;
    }
    else {
    size_t mlen = sym - 256 + MINMATCH;
    size_t roff = *ro16++;
    size_t mpos = rolz1dec_getmpos(&dec, block, pos, roff);
    copymatch(block + pos, block + mpos, mlen);
    rolz1dec_putpos(&dec, block, pos + mlen - 2);
    pos += mlen;
    }
    }


    where


    struct rolz1dec {
    uint16_t heads[256];
    uint32_t buckets[256][4096];
    };

    static inline void rolz1dec_putpos(struct rolz1dec *dec,
    const uint8_t *ibuf, size_t ipos)
    {
    uint8_t cc = ibuf[ipos-1];
    size_t head = dec->heads[cc];
    dec->heads[cc] = (head + 1) & 4095;

    uint32_t *bucket = dec->buckets[cc];
    bucket[head] = ipos;
    }

    static inline size_t rolz1dec_getmpos(struct rolz1dec *dec,
    const uint8_t *ibuf, size_t ipos, size_t roff)
    {
    uint8_t cc = ibuf[ipos-1];
    size_t head = dec->heads[cc];
    dec->heads[cc] = (head + 1) & 4095;

    uint32_t *bucket = dec->buckets[cc];
    size_t ix = (head - roff - 1) & 4095;
    size_t mpos = bucket[ix];
    bucket[head] = ipos;
    return mpos;
    }
    Last edited by svpv; 1st April 2019 at 11:08.

  28. The Following 2 Users Say Thank You to svpv For This Useful Post:

    elit (1st April 2019),Shelwien (1st April 2019)

Similar Threads

  1. BCM - The ultimate BWT-based file compressor
    By encode in forum Data Compression
    Replies: 110
    Last Post: 29th May 2019, 11:40
  2. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 10:26
  3. LZWC - A fast tree based LZW compressor
    By david_werecat in forum Data Compression
    Replies: 6
    Last Post: 16th January 2013, 03:45
  4. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  5. New fast open-source paq-based jpeg compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 13th September 2007, 13:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •