Results 1 to 7 of 7

Thread: LZ4, BWT, RLE?

  1. #1
    Member
    Join Date
    May 2016
    Location
    Italy
    Posts
    8
    Thanks
    4
    Thanked 1 Time in 1 Post

    Exclamation LZ4, BWT, RLE?

    Hi there...
    This is my first time here on encode.ru and seems cool!

    I wanted to ask if you guys think it might be a good idea to create this kind of C++ Application: It uses 3 different types of Algorithms before giving the output to the user in this way: LZ4, BWT, RLE:

    Is it possible to use LZ4 to create an initial step and then apply BWT to sort data... In the end RLE on data previously sorted by bwt?

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    bwt+rle can compress data 2-3x. this combination is pretty bad in terms of speed/compression ratio, compared to other alternatives (such as bsc of zstd), but if you want to make compressor for learning purposes, it will at least compress. i expect that lz4 preprocessing will decrease overall compression ratio. but yoy can try to tune it increasing minimum match length to 32 or so

    for example, best practical bwt-based compressor BSC runs data through lzp+bwt+rle+reversed mtf+special entropy coding. with only bwt+rle you leave ~50% of possible compression on the table, while compression time will be about the same (bwt is the slowest part of this algo sequence)

  3. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    alberto98fx (20th June 2016)

  4. #3
    Member
    Join Date
    Mar 2015
    Location
    Germany
    Posts
    57
    Thanks
    34
    Thanked 38 Times in 15 Posts
    I made a quick test for you using bce (my bwt based compressor) so you can make your own thoughts:

    Code:
    Program      | Time | Size
    bce          | 20.1 | 20.878.504
    lz4 -1 | bce | 12.7 | 35.930.444
    lz4 -9 | bce |  9.4 | 35.439.320
    drt | bce    | 14.4 | 20.672.568
    BCE is <2 times slower than bwt+rle but compresses better.
    DRT is the dictionary preprocessor of Alexander Rasushnyak.

  5. The Following User Says Thank You to Christoph Diegelmann For This Useful Post:

    alberto98fx (20th June 2016)

  6. #4
    Member
    Join Date
    May 2016
    Location
    Italy
    Posts
    8
    Thanks
    4
    Thanked 1 Time in 1 Post

    Lightbulb Idea

    What about instead of BWT using PPM?

  7. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    ppm usually employs some sort of arithmetic coding so it's full compressor by itself. pre-lz4 will make compression much worse, post-rle is useless

    but what's your goal? it should be obvious that you can't beat existing compressors developed by professionals, or even go any close. if you have reasonable C++ experience, i invite you to participate in improving BSC. i know how it should be done, but prefer to collaborate with people who can do part of work

  8. #6
    Member
    Join Date
    May 2016
    Location
    Italy
    Posts
    8
    Thanks
    4
    Thanked 1 Time in 1 Post
    I'm so sorry for the late! It is for educational purpose of course! And yes... Seems a great idea!

  9. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    I'm also sorry for late answer

    The idea is to make a faster compressor based on the BSC by replacing its last step. It can be done in the following steps:

    1) replace the bsc_coder_encode_block/bsc_coder_decode_block with functions executing SRC forward/backward transformation from OpenBWT

    2) add huffman/fse coders from FSE package processing SRC output in 16-64 KB blocks

    3) run an RLE encoding prior to SRC and somethat compress run lengths too

    4) speed up the SRC procedure using mtf_shelwien2 code

    I can provide you support in this work. If you are going to do it, i suggest you to clone BSC in GitHub and start to work there. In the plan above, after the first step the program will not compress at all but you can check that it can correctly extract the data. Second step will add compression, and steps 3-4 will ensure unprecedented speed/compression ratio, especially on GPU-occupied boxes. First two steps should be easy to implement - you just need to call the existing procedures.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •