1. The header identifies which bytes are compressible. A compressible byte is always followed by a run length (even if 1) so there is no need for an escape byte.

Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes. It seems like in general you could encode a run of n bytes using 8 + log(n) bits, so 5 bytes should be possible.

fpaq0 compresses 1 GB of zero bytes to 20 bytes. A second pass compresses it to 10 bytes. More passes makes it bigger.

2. 10 bytes, uh? Cool... let's see if we can reach those 5... without a custom-made program I think is not possible. We need to take in account headers.

Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes
Actually, it reaches 50 bytes at 4th run. Then it start increasing. 23 is after ccm

3. Originally Posted by Matt Mahoney
The header identifies which bytes are compressible. A compressible byte is always followed by a run length (even if 1) so there is no need for an escape byte.

Anyway, not sure if recursive byte pair encoding would compress 1GB of zero bytes smaller than 23 bytes. It seems like in general you could encode a run of n bytes using 8 + log(n) bits, so 5 bytes should be possible.

fpaq0 compresses 1 GB of zero bytes to 20 bytes. A second pass compresses it to 10 bytes. More passes makes it bigger.

Ah ok, thx Matt for clarifying this. Then there are can be at least some special cases where this algorithm performs better than normal RLE.
(but I really would guess normal RLE does a way better job - perhaps the author should add some benchmarks to his paper)

"A compressible byte is always followed by a run length (even if 1)"

This adds a lot of unnecessary extra bytes. About more than 50% of all occurences of the byte/char would have to be >1 run length.

Normal RLE is just more flexible. With a escape char/ byte you can just code the occurences, which really save you bytes(adding some overhead just there) -without adding overhead everywhere.

4. ## The Following User Says Thank You to Steve For This Useful Post:

Bulat Ziganshin (7th February 2015)

5. Originally Posted by Gonzalo
I saw something strange while running the test against ~6.7gb data.
The test set is a shar archive containing a LiberKey installation.

Code:
```  Char   (0) saves 231572627 bytes
Char � (204) saves 23288754 bytes
7180648320 -> 6925786971```
I wonder if is true that only two characters made gainings in such a big file...
the proposed RLE compressor useless for chars that that occured mostly single, that's most typical case. so you should wonder why there are chars that still gets benefit

6. Some test results from the example programs posted earlier:

Code:
```Filename         Orignal Mespotine  PackBits Tsukiyama
------------------------------------------------------
calgary\bib      111.261+  111.293   113.628   113.528
calgary\book1    768.771+  768.803   785.883   785.072
calgary\book2    610.856+  610.888   623.610   622.783
calgary\geo      102.400   102.432   102.226+  102.256
calgary\news     377.109   375.115+  376.084   376.305
calgary\obj1      21.504    18.631+   19.390    19.369
calgary\obj2     246.814+  246.846   248.356   248.211
calgary\paper1    53.161+   53.193    54.029    53.934
calgary\paper2    82.199+   82.231    83.826    83.661
calgary\paper3    46.526+   46.558    47.404    47.303
calgary\paper4    13.286+   13.318    13.573    13.554
calgary\paper5    11.954+   11.986    12.235    12.227
calgary\paper6    38.105+   38.137    38.921    38.893
calgary\pic      513.216   100.975+  109.524   108.274
calgary\progc     39.611+   39.643    39.975    40.099
calgary\progl     71.646    69.233    68.622+   69.231
calgary\progp     49.379    47.245    46.033+   46.109
calgary\trans     93.695    92.179+   92.492    92.522

Filename              Orignal    Mespotine     PackBits    Tsukiyama
--------------------------------------------------------------------
silensia\dickens   10.192.446+  10.192.478   10.422.651   10.410.427
silensia\mozilla   51.220.480   47.839.740+  48.785.288   48.771.984
silensia\mr         9.970.564    9.967.870    7.304.657    7.244.984+
silensia\nci       33.553.445   32.191.250+  38.218.365   38.504.498
silensia\ooffice    6.152.192    6.033.040+   6.060.136    6.077.203
silensia\osdb      10.085.684+  10.085.716   10.335.930   10.337.870
silensia\reymont    6.627.202+   6.627.234    6.730.810    6.708.956
silensia\samba     21.606.400   20.107.208+  20.138.170   20.150.529
silensia\sao        7.251.944+   7.251.976    7.332.554    7.300.187
silensia\webster   41.458.703+  41.458.735   42.094.958   41.962.355
silensia\x-ray      8.474.240+   8.474.272    8.543.060    8.488.706
silensia\xml        5.345.280    5.324.800+   5.414.858    5.401.844```

7. ## The Following 2 Users Say Thank You to jibz For This Useful Post:

Bulat Ziganshin (7th February 2015),PSHUFB (26th February 2015)

@steve
That's right, in some cases normal RLE, PB and Tsukyama are better, but they always have the overhead problem.
But you can combine, for example, PB with MeRLE-Basic (I wrote about this possibilty in the conclusion of the paper) like that:

-1 to -127 - the next 1 to 127 appearances of the current(!) run-value are to be encoded using RLE
0 to 127 - the next 1 to 127 appearances of the current(!) run-value are not encoded
128 no more encoding for this run-value anymore.

Unlike regular PB, where 0-127 includes all following characters, no matter what run-value they are, this PB attempt does PB only on the current run-value, leaving all the others untouched.

Example:

ABABABBBBABBBB = 14 chars = 112 Bits
regular PB: 4ABABA-4B0A-4B = 12 chars = 96 Bits
Mespotine-RLE + PB: AB1ABAB-4AB-4 = 11 chars +2 Bits CompBitList = 90 Bit
(Note, unliek normal PB, run-value and run are, again, reversed in MeRLE+PB!)

I haven't tested this method not much though, so this may or may not be a good idea. It would also change the calculation for the CompBitList.

BTW, thanks for your compliments on the writing style, though it lead to some confusion for you, so could you help me by reading my third post in this thread and tell me, if this is clearer to understand? I'd love to improve the 1.0-version paper and your feedback would probably help me a lot doing that.

@Bulat Ziganshin
I haven't seen the data, but it seems like, some characters were indeed RLE-compressible, while all others were not.

@jibz
Thanks for the testresults and the software-modifications for the comparisons. The results seem to support my claims which I'm happy with
Interestingly, MeRLE-Basic seems to perform(at least on the files you chose) better more often than PB, than I've expected. I thought, PB would be equal or better in efficiency

@Gonzalo
Does this mean, I have developed the first "true-recursive compression-method in the world (TM)" ? XD
The question where to start, I included flowchart implementations of the algorithm in the paper that should be implementable, if I didn't bug it up somewhere...

@Matt Mahoney
Thanks for the compressor-program. I haven't read it in detail yet, so I wonder, did you also implement the long_run-management?

@Sven Bent
Yes, it need to passes. Maybe some dynamic-calculation of the COmpBitList could make this faster, but Im not familiar with that subject.
An other way would be, splitting the file up into, let's say 1MB-packets and do the calculation for each pack again, so you could do this more for on the fly encoding purposes.
Another one would be, encode it in RLE in a cache while calculating the CompBitList and when writing the cache to the file, you "decompress" all characters, that are uncompressible at all.
But if this would be efficient, I fon't know.

@m^2
For example, if the Fraunhofer-Institute can't protect the algorithms used for MP3, only implementations, then anyone could legally do MP3-encoders without paying something to them...

But maybe I got something wrong....

@all
Thanks again for you comments and feedback. In fact, MeRLE-Basic is the first algorithm I was working, focused on overhead reduction only. I'm currently working on successors that base heavily on MeRLE-Basic(hence the basic in the name), where I'm focusing more on the compression effiency.
Some are quite fancy

9. Originally Posted by mespotine
@m^2
For example, if the Fraunhofer-Institute can't protect the algorithms used for MP3, only implementations, then anyone could legally do MP3-encoders without paying something to them...
https://en.wikipedia.org/wiki/Idea%E...ression_divide
Patents and copyright are entirely different beasts. In some countries you can patent algorithms.

10. Originally Posted by mespotine
@Matt Mahoney
Thanks for the compressor-program. I haven't read it in detail yet, so I wonder, did you also implement the long_run-management?
Yes. Length bytes 0..254 means a run of 1..255. Byte 255 means a run of 255 followed by another length byte. I also accounted for this coding in calculating whether a byte is compressible or not.

11. ## can you provide matlab code or java code for this topic as main project

Originally Posted by mespotine