Results 1 to 4 of 4

Thread: What is the state of the art in "lossless data compression"?

  1. #1
    Member
    Join Date
    Apr 2018
    Location
    GUANGZHOU
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Lightbulb What is the state of the art in "lossless data compression"?

    I have done some basic research in this field. It seems like dictionary-based compressors are still dominating. Am I right? If you can help me with some useful paper or links, it will be a great time saving! Thanks!

    Here is my understanding:
    1. LZ family, dictionary-based compression. We either build a global dictionary or use a sliding window. When the string comes in, the longest repetition will be matched. The id or position of the word in the dictionary will be stored.

    2. Entropy Encoder: dictionary-based method have a disadvantage that is minimal code length. The id or position requires some bits, which means for short(1-3bytes) words, the profit could not cover the cost. We usually use LZ family + entropy coder. We have Huffman code. Current fashion is ANS.

    3. PPM/Context based. We may have a sliding windows, but instead of just recording previous words, a *context tree* is built. We predict next word based on previous knowledge.

    4. BWT based + run-length code. BWT can effectively change the distribution of the source. The same chars are likely to be placed together. There is a change to apply run-length code or whatsoever.

    5. Re Pair/Byte Repair algorithm. We pack the highest freq words into one recursively util there is no adjacent repetition.

    6. other methods, please specify. Is there any other interesting method?

    Currently, I am designing a compressor for database engine like leveldb/rocksdb. Therefore, I need to do random access on compressed data. LZ family with a global dictionary seems most reasonable. My question is whether byte repair algorithm have a chance? Byte re-pairing is prefect for this scenario. I have not implemented it. Is it good in practice?

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    GLZA uses byte pairing (lz78) and it's usually stronger than BWT if not similar.

  3. #3
    Member
    Join Date
    Apr 2018
    Location
    GUANGZHOU
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks for the reply! Are we talking the same "Byte Pairing"? I know "Byte re-Pairing from "N. J. Larsson and A. Moffat, "Offline Dictionary-Based Compression". Proc. IEEE, 88(11)1722-1732, November 2000". It is not LZ78, correct me if I am wrong.

  4. #4
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    I believe we are talking about the same byte pairing algorithm: https://en.wikipedia.org/wiki/Grammar-based_code
    This is what GLZA, sequitur, and re-pair are based off of.
    Lz78 is pretty broad, it is any system which can precompute and replace all the context into a compressed static code which references a static dictionary such that random access can occur anywhere in the compressed stream.
    Hopefully this helps.

  5. The Following User Says Thank You to Lucas For This Useful Post:

    JIMLYN (8th April 2018)

Similar Threads

  1. new compressor LZNA = "LZ-nibbled-ANS" - "Oodle 1.45"
    By joerg in forum Data Compression
    Replies: 22
    Last Post: 19th February 2018, 05:50
  2. Sac: (State-of-the-Art) Lossless Audio Compression
    By Sebastian in forum Data Compression
    Replies: 38
    Last Post: 25th April 2016, 16:01
  3. Replies: 7
    Last Post: 4th January 2016, 15:06
  4. "bi-criteria" data compression
    By Garen in forum Data Compression
    Replies: 17
    Last Post: 9th December 2014, 15:28
  5. "Extreme" compression of DNS domains
    By nickety in forum Data Compression
    Replies: 20
    Last Post: 22nd October 2011, 01:20

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •