Results 1 to 4 of 4

Thread: LZJody

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts

    LZJody

    I just came across a couple compression libraries I hadn't heard of, and AFAIK haven't been discussed here. One of them is LZJody.

    Copied from the README:

    This code compresses and decompresses a data stream using a combination of compression techniques that are optimized for compressing disk image data.

    Compression methods used by this program include:

    • Run-length encoding (RLE), packing long repetitions of a single byte value into a short value:length pair
    • Lempel-Ziv (dictionary-based) compression with optimized searching
    • Sequential increment compression, where 8-, 16-, and 32-bit values that are incremented by 1 are converted to a pair consisting of an inital value and a count
    • Byte plane transformation, putting bytes at specific intervals together to allow compression of some forms of otherwise incompressible data. This is performed on otherwise incompressible data to see if it can be arranged differently to produce a compressible pattern.
    Interestingly, you can only feed it 4096 bytes at a time. It wouldn't be hard to create a file format similar to snappy-framed to string together multiple <= 4096 byte fragments for compressing larger pieces of data.

    It is licensed under the GPLv2.

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    I just tested the "incompressible" file provided, it is easily compressed with any method other than huffman... So it isn't actually incompressible. That said, it's entropy is a tad under 7 bits per byte which is quite high so I guess it'd classify as somewhat incompressible.
    Upon closer inspection over 26% of this file is actually comprised of byte values 92-101 (10 different bytes). 40% of the bulk of the file is 27 other bytes and the remaining 34% is a mix of all other possible bytes of a near equal yet very low distribution.

    I'm not quite sure what file would actually be an appropriate test to demonstrate the benefit of his Byte plane transform as this isn't a very good "incompressible" file considering it is compressed to about 35% its original size using CM (fp8v3). :/
    Last edited by Lucas; 2nd September 2015 at 07:34.

  3. #3
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    test with enwik8:
    Code:
    e: 100000000 -> 92340743, 11s
    d: 100000000 <- 92340743, 4s

  4. #4
    Member
    Join Date
    Nov 2018
    Location
    North Carolina, USA
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I know I'm three years late to this thread, but I thought I'd explain a bit. I chose the "incompressible" data block because it was in a disk image and the algorithm didn't compress it at all, so I could use it to easily test behavior when compression of a block failed. lzjody is just my attempt at playing with compression ideas; I am not a data compression expert by any means. The more interesting bit that drove me to respond is the questions about the utility of the byte plane transform. This idea came about because I noticed a pattern in several data blocks where 32-bit pieces of data would do something like increment one of the bytes but not the other three; I realized that 3/4 of such data would be very efficiently compressible with RLE if the identical bytes could be transformed to sit side-by-side. The byte plane transform on a full block of such 32-bit incremental data (regardless of the particular byte incremented) allows 3/4 compression as RLE and 1/4 compression as seq8 if the incrementing byte sequence increments by one.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •