Results 1 to 8 of 8

Thread: Fast compression with limited memory

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts

    Fast compression with limited memory

    Are there any approaches on doing fast compression with limited memory (<10 MB or so)?

    I'm moving mostly text-based data (text files, source, web pages) between old machines with a slow LAN (few MB/sec). The 1MB variant of ppmd is good, but it's sometimes barely able to keep up with speed.

    My thoughts are:
    1) ROLZ with a hash table. Instead of a different offset table for every order-2 (or 3) context, I can use a smaller hash of the context. I can encode into 10-bit symbols (5-bit index, 5-bit match length, <256 reserved for literals) and use an entropy coder on the symbols.

    2) Simple PPM, o1 + o2 + o4 with match model. I can use a hash table for o2 and o4, and escape in the order o4->o2->o1 as needed. If the "match model" indicates a match (exactly the same as CM4), the expected character's probability in the current context is temporarily higher by some weight (number of matched bytes). A custom allocator and restarting the model like ppmd would keep within memory limits. I would probably try 2 MB for o2, 4 MB for o4 and 4 MB for the match model history.

    1 and 2 likely compress better than gzip, and 1 would be faster than 2. Has anyone tried this before with zip-like (or better) compression, or have any comments?

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I think that Shindler's Transform of order 5 should be doable within that memory constraints (look at libbsc sources). However, decompression is somewhat slower in ST5 than compression, but if you're decompressing once then it shouldn't matter much.

    Else, you can look at zling and what it does. It's faster and better than gzip on textual data and doesn't use huge amount of memory.
    Last edited by Piotr Tarsa; 3rd February 2014 at 18:08.

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I haven't looked much at Shindler's transform because of patents.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    try bsc first. it's the most efficient text compressor i know in the middle-class (i.e. not zip nor zpaq)

  5. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    zling (https://github.com/richox/libzling) use 16MB blocksize and 10MB dictionary for compressing, however you can easily reduce the memory usage (i.e. 4MB block + 5MB dictionary) while compressing even faster and still have good compression.

    just took a simple test for enwik8 (100MB) on my laptop (cygwin):

    blocksize+dictionary: 16MB+10MB = 32190KB, 3.4s
    blocksize+dictionary: 1MB+10MB = 34214KB, 2.9s
    gzip: 36518KB, 6.7s

  6. #6
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    I should avoid anything with a patent... I compress on one end and decompress on the other end without storing anything (all in memory), so somewhat quick compression and decompression is good (i.e. no block transforms with indeterminate speeds).

    Quote Originally Posted by Bulat Ziganshin View Post
    try bsc first. it's the most efficient text compressor i know in the middle-class (i.e. not zip nor zpaq)
    It is fast on large files, but strangely slow on small files. The qlfc coder in it looks interesting, I will bookmark it and study it later.

    Quote Originally Posted by RichSelian View Post
    It sounds, if I understand zlite correctly (the smaller one), you do something similar to 1: matches from ROLZ, last character repeats and unmatched literals are packed into a sequence of 10-bit symbols.

    zlite has order-3 ROLZ so the tables are more specific (and small to search). It looks like zling has exponential-length encoding for match indexes and lengths (head - index with a second entropy coder), and each order-1 ROLZ bucket has 4096 entry possibilities, but you have a hash table (suffix list) to accelerate searching. Did I understand it correctly?

    I just wrote a o0 and o2 Huffman coder (context seen 8 times, create o2 entry) + LZP (0-255 = character, 256-512 = match length + 256). I think some smart way to pack symbols will be easiest.
    Last edited by cade; 3rd February 2014 at 21:26.

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    i.e. no block transforms with indeterminate speeds
    Shindler's Transform of low orders shouldn't be very fluctuating as eg with libbsc implementation, ST has more or less constant number of operations per symbol - the only thing that would cause speed differences would be cache hit ratio I think.

    It is fast on large files, but strangely slow on small files.
    The ST coders use auxiliary tables proportional to the order of the transform in addition to tables proportional to block size. Thus, for very small blocks, time to handle the fixed size tables can dominate the time. So for very small files you should use very low order Shindler's Transform, eg ST3 and for big enough files use ST4 or ST5.

    But probably zling should be a better option since it's already much faster and better than gzip on textual files.

    zling is BSD-licensed and I think it doesn't include any patents so you shouldn't be worried when using it, IMO.

  8. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    cade (3rd February 2014)

  9. #8
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    It's interesting that bsc is about 10 seconds on 100MB, but about 1 second on 1.5 MB with the default settings.

    I made some progress on idea #1 in the first post:
    Block size: 2 MB (ROLZ dictionary is the processed part of the block)
    Actual memory is (1 + 2) * 2 MB because the input/output block is uint8, processed block is uint16
    16k ROLZ tables with 16 entries each, hashed order-3

    Including IO buffer and stack, it's a total of 8.7 MB

    enwik8: 35,368k in 2.9 seconds (0.28 in Huffman coding, 2.32 in running ROLZ on blocks), unpacks in 1.4 seconds (0.69 in Huffman decoding with a small index table and a full one, 0.5 in unpacking ROLZ). I/O and other delays are why total time doesn't add up to the individual parts. Note that this speed relative to zling is worse because I'm on a desktop. Searching is just checking the 16 entries of the ROLZ table for that context (hash checked with the 3 bytes for the real context plus the next two bytes for MIN_LENGTH).

    Symbol packing in zlite is similar to what I was doing with bits, this is how I use it:
    Code:
    const int MIN_LENGTH = 2;
    const int MAX_LENGTH = 47;
    
    const int ROLZ_NUM_TABLES_BITS = 14;
    const int ROLZ_TABLE_SIZE_BITS = 4;
    
    const int ROLZ_NUM_TABLES = 1 << ROLZ_NUM_TABLES_BITS;
    const int ROLZ_NUM_TABLES_MASK = ROLZ_NUM_TABLES - 1;
    
    const int ROLZ_TABLE_SIZE = 1 << ROLZ_TABLE_SIZE_BITS;
    const int ROLZ_TABLE_SIZE_MASK = ROLZ_TABLE_SIZE - 1;
    
    const int NUM_SYMBOLS = 1024;
    const int MAX_REACH = 256 + ((MAX_LENGTH - MIN_LENGTH) * ROLZ_TABLE_SIZE); // Must be less than NUM_SYMBOLS
    
    int match_sym(int idx, int len)
    {
        return 256 + ((len - MIN_LENGTH) << ROLZ_TABLE_SIZE_BITS) + idx;
    }
    
    int match_unsym_idx(int sym)
    {
        sym -= 256;
        return sym & ROLZ_TABLE_SIZE_MASK;
    }
    
    int match_unsym_len(int sym)
    {
        sym -= 256;
        sym >>= ROLZ_TABLE_SIZE_BITS;
        return sym + MIN_LENGTH;
    }
    I'm not sure what the overhead of Polar coding vs Huffman is, more experiments are needed. Every block has a new Huffman tree (every 2 MB), which I store as RLE-encoded frequencies. I should replace that with just the code lengths for each symbol, even though it's only a difference of 1024 uint16 vs 1024 uint8 to store for every 2 MB.
    Last edited by cade; 3rd February 2014 at 22:34. Reason: Fixed a shift

Similar Threads

  1. Linux memory manager discussion, same problem as compression
    By nburns in forum The Off-Topic Lounge
    Replies: 10
    Last Post: 11th December 2013, 22:44
  2. How does memory speed affect compression time?
    By encode in forum Data Compression
    Replies: 16
    Last Post: 11th February 2013, 13:54
  3. Replies: 8
    Last Post: 12th April 2009, 02:39
  4. Can't allocate memory required for (de)compression..help!
    By Duarte in forum Data Compression
    Replies: 19
    Last Post: 18th July 2008, 18:14
  5. Fast LZ compression
    By encode in forum Forum Archive
    Replies: 35
    Last Post: 25th April 2007, 01:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •