Results 1 to 5 of 5

Thread: Does this compression method already exist?

  1. #1
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    I was wondering if this compression method already exists:

    If we have a sequence of bytes, we translate them into offsets pointing backwards at the nearest occurence of the same byte (kind of like in normal LZ compression).

    For example, following sequence:

    ...
    ...
    48
    69
    0
    132
    69
    69
    132
    48

    could translate into:

    ...
    ...
    129
    6627
    9
    150
    3
    1
    3
    7

    In the result, frequent bytes turn into smaller numbers which can be stored in less bits - just like in Huffman. Infrequent bytes can turn into numbers larger than 255.

    Compared to Huffman, you avoid building a huffman tree and the method also adapts to changes in frequency through the data.

    I've tested it in practise with mixed results on different kinds of data...

    For now, I just wanted to know if it was already existing

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    move-to-front will be better, anyway

  3. #3
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    But isn't move-to-front slow at encoding?

    This implementation is just O(n) for encoding.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Very interesting. That is LZ77 with the match length fixed at 1.
    If you remove the duplicates then that is move to front encoding, which has a shorter encoding. Then if you do a Burrows-Wheeler transform first, you get very good compression (although I don't use MTF in BBB. I use a fast adaptive order 0 model and arithmetic coding).

    Also, symbol ranking compressors use MTF (srank, sr2), but you keep a separate list in each order 3 or 4 context, and the list is short, only the last 3 distinct values. The other bytes are modeled in a lower order context. This keeps it fast, and also it doesn't help compression much to have a longer list. In sr2 I model the other values in an order 1 context and arithmetic code. Compression speed is about like zip -9 but compression is 5-20% better for text. However, decompression time is about the same as compression.
    http://cs.fit.edu/~mmahoney/compression/#sr2

    srank uses a trick to speed up MTF. First it models the last 3 values in an order 3 context, then it models the remaining bytes in an order 0 context in a pseudo-MTF queue. Instead of moving to the front, it swaps with a randomly picked byte about half way to the front. To make this fast, it maintains an index mapping bytes to queue location. srank uses a fixed Huffman code and is about twice as fast as sr2, although the compression is not very good. Of course srank is very old and it was designed when speed depended more on number of instructions (especially multiplication) instead of memory speed and cache locality.

  5. #5
    Programmer giorgiotani's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    166
    Thanks
    3
    Thanked 2 Times in 2 Posts
    Which models of data you found are compressed better and worser after that coding?
    Analogic data compression filed comes to mind; the method seem good for being applied after a differential filtering of the input (simplest example, each byte is coded as the difference with the previous byte, if samples are coded as bytes) since on data from analogic sources (images, audio...) this tend to leave a quite repetitive patters since deta values are related each other too.
    IMHO this coding would do a better job on this type of input after a differential filtering stage (just an idea, I've not tested it)

Similar Threads

  1. Frequent Bits Elimination Method - suggest please
    By Scientist in forum Data Compression
    Replies: 14
    Last Post: 28th September 2009, 18:30
  2. Most efficient/practical compression method for short strings?
    By never frog in forum Data Compression
    Replies: 6
    Last Post: 1st September 2009, 04:05
  3. Stuffit Method 15 - Dedicated to BWT maniacs
    By encode in forum Data Compression
    Replies: 1
    Last Post: 12th May 2008, 23:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •