Page 2 of 2 FirstFirst 12
Results 31 to 39 of 39

Thread: No more encoding-overhead in Run Length Encoding? Read about Mespotine-RLE here... :)

  1. #31
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The header identifies which bytes are compressible. A compressible byte is always followed by a run length (even if 1) so there is no need for an escape byte.

    Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes. It seems like in general you could encode a run of n bytes using 8 + log(n) bits, so 5 bytes should be possible.

    fpaq0 compresses 1 GB of zero bytes to 20 bytes. A second pass compresses it to 10 bytes. More passes makes it bigger.
    Last edited by Matt Mahoney; 7th February 2015 at 01:45.

  2. #32
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    10 bytes, uh? Cool... let's see if we can reach those 5... without a custom-made program I think is not possible. We need to take in account headers.

    Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes
    Actually, it reaches 50 bytes at 4th run. Then it start increasing. 23 is after ccm
    Last edited by Gonzalo; 7th February 2015 at 02:42.

  3. #33
    Member
    Join Date
    Dec 2011
    Location
    Germany / Hessen
    Posts
    18
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Matt Mahoney View Post
    The header identifies which bytes are compressible. A compressible byte is always followed by a run length (even if 1) so there is no need for an escape byte.

    Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes. It seems like in general you could encode a run of n bytes using 8 + log(n) bits, so 5 bytes should be possible.

    fpaq0 compresses 1 GB of zero bytes to 20 bytes. A second pass compresses it to 10 bytes. More passes makes it bigger.


    Ah ok, thx Matt for clarifying this. Then there are can be at least some special cases where this algorithm performs better than normal RLE.
    (but I really would guess normal RLE does a way better job - perhaps the author should add some benchmarks to his paper)



    "A compressible byte is always followed by a run length (even if 1)"

    This adds a lot of unnecessary extra bytes. About more than 50% of all occurences of the byte/char would have to be >1 run length.

    Normal RLE is just more flexible. With a escape char/ byte you can just code the occurences, which really save you bytes(adding some overhead just there) -without adding overhead everywhere.

  4. The Following User Says Thank You to Steve For This Useful Post:

    Bulat Ziganshin (7th February 2015)

  5. #34
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Gonzalo View Post
    I saw something strange while running the test against ~6.7gb data.
    The test set is a shar archive containing a LiberKey installation.

    Code:
      Char   (0) saves 231572627 bytes
      Char � (204) saves 23288754 bytes
      7180648320 -> 6925786971
    I wonder if is true that only two characters made gainings in such a big file...
    the proposed RLE compressor useless for chars that that occured mostly single, that's most typical case. so you should wonder why there are chars that still gets benefit

  6. #35
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    114
    Thanks
    91
    Thanked 69 Times in 49 Posts
    Some test results from the example programs posted earlier:

    Code:
    Filename         Orignal Mespotine  PackBits Tsukiyama
    ------------------------------------------------------
    calgary\bib      111.261+  111.293   113.628   113.528
    calgary\book1    768.771+  768.803   785.883   785.072
    calgary\book2    610.856+  610.888   623.610   622.783
    calgary\geo      102.400   102.432   102.226+  102.256
    calgary\news     377.109   375.115+  376.084   376.305
    calgary\obj1      21.504    18.631+   19.390    19.369
    calgary\obj2     246.814+  246.846   248.356   248.211
    calgary\paper1    53.161+   53.193    54.029    53.934
    calgary\paper2    82.199+   82.231    83.826    83.661
    calgary\paper3    46.526+   46.558    47.404    47.303
    calgary\paper4    13.286+   13.318    13.573    13.554
    calgary\paper5    11.954+   11.986    12.235    12.227
    calgary\paper6    38.105+   38.137    38.921    38.893
    calgary\pic      513.216   100.975+  109.524   108.274
    calgary\progc     39.611+   39.643    39.975    40.099
    calgary\progl     71.646    69.233    68.622+   69.231
    calgary\progp     49.379    47.245    46.033+   46.109
    calgary\trans     93.695    92.179+   92.492    92.522
    
    Filename              Orignal    Mespotine     PackBits    Tsukiyama
    --------------------------------------------------------------------
    silensia\dickens   10.192.446+  10.192.478   10.422.651   10.410.427
    silensia\mozilla   51.220.480   47.839.740+  48.785.288   48.771.984
    silensia\mr         9.970.564    9.967.870    7.304.657    7.244.984+
    silensia\nci       33.553.445   32.191.250+  38.218.365   38.504.498
    silensia\ooffice    6.152.192    6.033.040+   6.060.136    6.077.203
    silensia\osdb      10.085.684+  10.085.716   10.335.930   10.337.870
    silensia\reymont    6.627.202+   6.627.234    6.730.810    6.708.956
    silensia\samba     21.606.400   20.107.208+  20.138.170   20.150.529
    silensia\sao        7.251.944+   7.251.976    7.332.554    7.300.187
    silensia\webster   41.458.703+  41.458.735   42.094.958   41.962.355
    silensia\x-ray      8.474.240+   8.474.272    8.543.060    8.488.706
    silensia\xml        5.345.280    5.324.800+   5.414.858    5.401.844

  7. The Following 2 Users Say Thank You to jibz For This Useful Post:

    Bulat Ziganshin (7th February 2015),PSHUFB (26th February 2015)

  8. #36
    Member
    Join Date
    Jan 2015
    Location
    germany
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    First of all, thank you for your kind comments.


    @steve
    That's right, in some cases normal RLE, PB and Tsukyama are better, but they always have the overhead problem.
    But you can combine, for example, PB with MeRLE-Basic (I wrote about this possibilty in the conclusion of the paper) like that:


    -1 to -127 - the next 1 to 127 appearances of the current(!) run-value are to be encoded using RLE
    0 to 127 - the next 1 to 127 appearances of the current(!) run-value are not encoded
    128 no more encoding for this run-value anymore.


    Unlike regular PB, where 0-127 includes all following characters, no matter what run-value they are, this PB attempt does PB only on the current run-value, leaving all the others untouched.


    Example:


    ABABABBBBABBBB = 14 chars = 112 Bits
    regular PB: 4ABABA-4B0A-4B = 12 chars = 96 Bits
    Mespotine-RLE + PB: AB1ABAB-4AB-4 = 11 chars +2 Bits CompBitList = 90 Bit
    (Note, unliek normal PB, run-value and run are, again, reversed in MeRLE+PB!)

    I haven't tested this method not much though, so this may or may not be a good idea. It would also change the calculation for the CompBitList.


    BTW, thanks for your compliments on the writing style, though it lead to some confusion for you, so could you help me by reading my third post in this thread and tell me, if this is clearer to understand? I'd love to improve the 1.0-version paper and your feedback would probably help me a lot doing that.


    @Bulat Ziganshin
    I haven't seen the data, but it seems like, some characters were indeed RLE-compressible, while all others were not.


    @jibz
    Thanks for the testresults and the software-modifications for the comparisons. The results seem to support my claims which I'm happy with
    Interestingly, MeRLE-Basic seems to perform(at least on the files you chose) better more often than PB, than I've expected. I thought, PB would be equal or better in efficiency


    @Gonzalo
    Does this mean, I have developed the first "true-recursive compression-method in the world (TM)" ? XD
    The question where to start, I included flowchart implementations of the algorithm in the paper that should be implementable, if I didn't bug it up somewhere...


    @Matt Mahoney
    Thanks for the compressor-program. I haven't read it in detail yet, so I wonder, did you also implement the long_run-management?




    @Sven Bent
    Yes, it need to passes. Maybe some dynamic-calculation of the COmpBitList could make this faster, but Im not familiar with that subject.
    An other way would be, splitting the file up into, let's say 1MB-packets and do the calculation for each pack again, so you could do this more for on the fly encoding purposes.
    Another one would be, encode it in RLE in a cache while calculating the CompBitList and when writing the cache to the file, you "decompress" all characters, that are uncompressible at all.
    But if this would be efficient, I fon't know.


    @m^2
    Hmm, do you have a link where I can read about this? I mean, if only implementations are copyrightable, then I guess patenting algorithms would be quite difficult to do.
    For example, if the Fraunhofer-Institute can't protect the algorithms used for MP3, only implementations, then anyone could legally do MP3-encoders without paying something to them...


    But maybe I got something wrong....


    @all
    Thanks again for you comments and feedback. In fact, MeRLE-Basic is the first algorithm I was working, focused on overhead reduction only. I'm currently working on successors that base heavily on MeRLE-Basic(hence the basic in the name), where I'm focusing more on the compression effiency.
    Some are quite fancy

  9. #37
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by mespotine View Post
    @m^2
    Hmm, do you have a link where I can read about this? I mean, if only implementations are copyrightable, then I guess patenting algorithms would be quite difficult to do.
    For example, if the Fraunhofer-Institute can't protect the algorithms used for MP3, only implementations, then anyone could legally do MP3-encoders without paying something to them...
    https://en.wikipedia.org/wiki/Idea%E...ression_divide
    Patents and copyright are entirely different beasts. In some countries you can patent algorithms.

  10. #38
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by mespotine View Post
    @Matt Mahoney
    Thanks for the compressor-program. I haven't read it in detail yet, so I wonder, did you also implement the long_run-management?
    Yes. Length bytes 0..254 means a run of 1..255. Byte 255 means a run of 255 followed by another length byte. I also accounted for this coding in calculating whether a byte is compressible or not.

  11. #39
    Member
    Join Date
    Apr 2016
    Location
    Bangalore
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post can you provide matlab code or java code for this topic as main project

    Quote Originally Posted by mespotine View Post
    Hi everyone.This is my first post in this forum as I need some help from you.After one long year, I have finally finished my paper of a rework of Run Length Encoding.Mespotine-RLE-basic, which is the first RLE-algorithm, that produces no useless overhead in the encoding itself anymore. The only overhead produced is a header of only 32 bytes(!).That means: no more RLE-encoded-data twice the size of the original, as Mespotine-RLE-basic only compresses the run-values, that are compressible in the first place.So that means: either MeRLE compresses, or it’s just, as I said, 32 Bytes bigger.You can find the paper at: http://mespotine.de/zugehoert/wp-con...1-2015.pdf.pdfor at http://arxiv.org/abs/1501.05542 or at https://archive.org/details/MeoMespo...ic20012015.pdfThis is a version 0.9 prerelease and I’m open for your questions, suggestions, comments, bugreports that I can correct in version 1.0, so don’t hesitate, if you have something to comment The algorithm is licensed under a creative commons license 3.0 cc-by-sa/de , means, you can alter and use it, even for commercial purposes, as long as you include my name and release your alterations under a similar license.Consider it as a gift to the data-compression community, that has teached me a lot about data-compression (what it is and what it isn't).Cheers and have fun with it Meotrss.mespotine.de

Page 2 of 2 FirstFirst 12

Similar Threads

  1. GearEnc - test application gear encoding
    By Sportman in forum Data Compression
    Replies: 7
    Last Post: 19th May 2013, 04:33
  2. Replies: 2
    Last Post: 17th January 2012, 00:12
  3. Mark Forums Read
    By Surfer in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 12th May 2010, 18:07
  4. Advanced Huffman Encoding
    By Simon Berger in forum Data Compression
    Replies: 28
    Last Post: 15th April 2009, 14:24
  5. Variable-length Offset
    By Cyan in forum Data Compression
    Replies: 8
    Last Post: 12th December 2008, 14:06

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •