19th August 2008, 21:42
Guide to the new compression
I've been waiting for new compressors to enter top rankings of current
benchmarks, due to the kuhnian paradigm shift that was announced by the
experts here at encode.ru. The fruits of this new knowledge has not been
put to use by amateurs to far. To fix this, here is my guide how you can
write a compressor that is top ranked on various benchmarks, you do
not need any experience or expertise in compression (as explained by
Shelwien), furthermore you need only about an hour of your time to do
this. Because I don't think my theories about how to do this are relevant,
I will only keep to the fresh theories presented here at encode.ru by
The guide is based on applying the new theories to the knowledge of
nz. As we know it was very aggressively nullified by experts and
geniuses here by announcements that it is a set of magic filters.
The expert Shelwien speculated that some "general programming" is
required to make it, defining his term as "the stuff like described
in Knuth's books". This guide will present a way to do a session of
general programming to summon magic filters for some good and proven
quality piece of work like bzip2 (or gzip) and make it to the top
rankings in benchmarks.
How difficult it is to write a magic filter? Michael Maniscalco
provided answer for this: "anybody can write a filter". Very rarely
it happens that one can agree with Michael, but here we can.
So how much time you need for reaching the top rankings? For nz,
about 5% of the time was spent on the magic filters. After being
unable to generate data where blizzard would outperform nz, even
though being able to generate data where nz filters cause compression
and speed loss, Christian Martelock declared: "Blizzard was
written on one day". So we use this as our timeframe. Suppose the
genius woke up early and began working on his compressor at 9:00am
and arrived to the version 0.24 at 9:00pm. So we get about 10 hours
of effective working time on this busy day, so 5% of this time is
30 minutes. Since we are not all geniuses, suppose we take slightly
longer to summon the filters and do some general programming,
using these estimates, I presume it takes you one hour total.
So what results can we expect? Let's derive an approximation by
looking at the enwik9 results:
nanozipltcb 166,251,135 348s 185s
bzip2 1.0.2 -9 253,977,839 379s 129s
Suppose you have some skill, then we can expect much better
results, something like this:
<your_compressor> 100,000,000 ???s ???s
M99 v2.1 178,910,174 713s 535s
So you are also applicable for the hutter prize. That is good because
it means expert Shelwien will consider you have something that he
describes as "compression skill".
The guide is divided into two parts, first part is for "general
programming" and another part is for summoning the magic filters.
1. go to amazon.com and buy TAOCP
2. download bzip2 source code
3. skim TAOCP for shellsort optimizations
4. apply shellsort optimizations to blocksort.c
5. remove the blocksize limitations from the blocksort algorithm,
if you get assert failures or segfaults, just replace the whole sort
with the best algorithm given by knuth
6. scan the first kb of a file for a word "the" and if it exist,
apply magic textfilter to it or if the file begins with MZ,
apply magic exe filter to it.
Summon Magic Filters:
1. before bzip2, filter the data to remove capital letters by
inserting flags, so that "Genius" becomes "#genius".
2. surround words and flags by spaces, so that "expert Shelwien"
becomes " expert # shelwien ".
3. replace common words with dictionary indexes, such that "Magic Filter"
becomes " # 5 # 9 ".
4. after encountering e8 or e9 bytes, add the pointer address
for the following doubleword (if you are unsure what is a doubleword
or a pointer, just look it up on wikipedia)
Now you are done. To study your work more closely, permute sfc.tar
or generate some other data until you get the results you want. Compare
the results for various permuted and randomized files and see if
everything working as supposed in the orthodoxy.
I hope this guide is useful for aspiring amateurs and dilettantes
like toffer and osmanturan if they have time to do some general
programming instead of accusing other work being clones etc. Good luck.
Just ask Shelwien, Christian or me if something isn't clear.
19th August 2008, 22:32
Sorry, I've only read the first couple of lines - but whatever it is, you should really let it go, Sami. Everyone appreciates your work and your compressors, but well...
Maybe Ilia should close this thread before it becomes another flame-war filled with senseless accusations and half-truths.
19th August 2008, 22:35
I'm _not_ a programmer, but at least to my knowledge some members of the forum asked here for some theories. They got _nothing_ from you (<- Sami)!
Originally Posted by Sami
And I really ask myself, if writing a surprising _good_ compression-algo has to be coming along with an invisible (and un-guessable) ability in social communication.
In other forums people behaving like that (similar to a troll) would soon been thrown out.
Please be more polite!
Just my two cents...
19th August 2008, 22:46
I think someone thinks that it is impossible to be banned at this forum... I'm not so sure about that...
19th August 2008, 23:32
I must admit being a heavy user of nanozip, it seems to me that Sami take things wat to personal, and every test where NZ doesn't come out on top he takes it as an assault on his coder.
Here i'm thinkink about his response for my simple test on MMA vs NZ on a cue/bin file.
Sami... Your attitude seems bordline paranoia.
just focus on nanozip. I'm looking forward to every new version
--- edit ---
Mod please fell free to delete this if you find it to offensive
Last edited by SvenBent; 19th August 2008 at 23:37.
19th August 2008, 23:59
I find it interesting, but expected, that any of my points about there being number of trolls in this forum, is now answered by the reverse and personal slander. This is due to the very authoritarian and even totalitarian (now thanks to encode) atmosphere in this forum.
SvenBent, what is it about my suggestion that for comparing audio compression, you should use wav format, is not clear? My campaign for the last 10 years has been a consistent one, use meaningful test data to produce meaningful results.
20th August 2008, 00:42
Guys, please cool it. This is getting a little over the top now.