It's not really what I meant... But in this version I improve the LZP-layer, as a result higher compression, especially on text files.
Download TC 5.0dev8 (30 KB)
TC 5.0dev8 on Large Text Compression Benchmark:
ENWIK8: 27,801,253 bytes
ENWIK9: 246,923,158 bytes (c 376 sec, d 415 sec)
Memory usage: 24 MB
P4 3.0 GHz, 1 GB RAM, Windows XP SP2
Well, the speed is affected. I think it's due to the cache misses - do you remember my low memory LZP index table implementation? By now, reference to it is doubles.
TC 5.0dev8 on Calgary Corpus:
bib: 28,647 bytes
book1: 247,931 bytes
book2: 167,099 bytes
geo: 61,858 bytes
news: 117,870 bytes
obj1: 10,510 bytes
obj2: 76,037 bytes
paper1: 17,068 bytes
paper2: 26,336 bytes
pic: 52,800 bytes
progc: 12,786 bytes
progl: 15,117 bytes
progp: 10,626 bytes
trans: 16,623 bytes
total: 861,308 bytes (2.1932 bpb)
TC 5.0dev8 on SFC:
A10.jpg: 853,308 bytes
acrord32.exe: 1,703,280 bytes
english.dic: 928,452 bytes
FlashMX.pdf: 3,761,712 bytes
fp.log: 628,256 bytes
mso97.dll: 2,068,751 bytes
ohs.doc: 854,993 bytes
rafale.bmp: 1,033,156 bytes
vcfiu.hlp: 709,955 bytes
world95.txt: 586,745 bytes [!]
total: 13,128,608 bytes
Okay, a few notes about testing results. Generally, the compression is improved, but on small files compression is about the same and on some files like 'english.dic' compression is ruined. But now TC compresses 'world95.txt' to 572 KB, it's good. Also note, on ALL my testing files compression is improved, sometimes smaller, sometimes bigger. The speed is affected but not too much - it completely depends on files and PC - since we must have a fast access to the memory.
By the point, TC is stands for Turbo Compressor.
Just downloaded it. Thanks!
Performance on 'bible.txt', 4,047,392 bytes
TC 5.0dev8: 892,651 bytes
LZPXJ 1.2a, -m3: 959,008 bytes
UHARC 0.6b, -mz: 1,002,070 bytes
LZPX 1.5b: 1,117,790 bytes
PKZIP 2.50, -exx: 1,172,728 bytes
13th July 2006, 05:23
Be interesting to see how this version performs on the MFC test.
13th July 2006, 05:28
TC 5.0dev8 fails to correctly decompress the MFC test set (output size is three times the original input size).
13th July 2006, 09:56
I guess, I know what's the problem. For the first time, I compile this version with different compiler options, enabling extra optimizations such as 'loop-unrolling' and many more. Due to some incompatibility with Hardware/Software this can provide such issue. Why I think so:
+ Each version I release I test on a gigabytes of data. On my PC no issues found.
+ Version 5.0dev8 differences from 5.0dev7 insignificantly - only LZP index table was changed. And this table cannot provide such bug.
+ And finally, output size CANNOT be larger than original, uncompressed data. It's mandatory for this algorithm - even if this algorithm have serious bugs or compressed data is corrupted, output size must be as original. But I think, 'loop-unrollng' or something like that in some cases and with different Software/Hardware can provide this issue.
So, this evening I recompile this version with standard options, as always. Anyway, you'll be informed.
13th July 2006, 17:12
Okay, I recompile TC 5.0dev8 using standard options. Just redownload the file! The CRC32 of this new EXE must be: A04461D6
Download TC 5.0dev8 (Recompiled) (30 KB)
13th July 2006, 17:12
13th July 2006, 21:08
By the way, read some article on TC (Russian):
In this article I explain the TC algorithm, basic principles and implementation details.
13th July 2006, 21:32
13th July 2006, 22:51
That photo looks a bit like me!
13th July 2006, 23:27
Looks like recompiled TC is clean! Well, do you see how important the compiler options... The source code was completely untouched, just different compiler settings - that's why the version number was unchanged.
14th July 2006, 04:35
It shows no problem on my PC with 911MB Tar file.
14th July 2006, 11:10
+ Add EXE-Filter. This filter gives serious compression improvement on EXE/DLL files. Firstly, this can noticeable improve compression on SFC test kit - since improvement can be about 200 KB on each of two executable files in this benchmark. But this is not a target. I think small filter not hurts, but I will not do a lots of filters and transform stages, as with PIMPLE, since after that this compressor becomes like a bundle-monster.
+ Add CRC32 checking. This is a must have feature. In addition, this feature will help to detect any engine issues, and ensures correct decompression.
Well, looks like the work on base engine is over. I will continue experimenting, and will listen to other data compression programers for improvement suggestions. But at this moment, configuration of encoder is the best - in all terms - compression, speed and memory usage.
My kung-fu is the best!
14th July 2006, 11:23
I have a few ideas for the base-engine improvement, though. Like coding quantized match lengths.
14th July 2006, 18:27
What algorithm uses QazaR 0.0 pre5?
world95.txt: 586,745 tc
world95.txt: 567,108 qazar
14th July 2006, 19:46
What algorithm uses QazaR 0.0 pre5?
QAZAR uses modified LZRW4 algorithm. The author calls it just LZP - but this is not true.
Also note with this options (-x7 -l7) memory usage grows linearly and can achieve really large numbers. At the same time, TC always uses 24 MB.
And finally, in fact, QAZAR is really slower than TC!
15th July 2006, 01:25
I'm well aware that TC superior to QAZAR and many others.
15th July 2006, 01:28
Werner has now confirmed that TCdev8 is now decompressing without problems.
15th July 2006, 10:22
It's interesting, but most of newbies authors learned from LZPX/LZPXJ since they are OpenSource. And QAZAR is not exclusion from the rules - do you remember the first versions? Werner ask about QAZAR when - is it LZPX clone or something? So, do not forget about the roots!
15th July 2006, 10:48
Performance on 'calgary.tar', 3,152,896 bytes
TC 5.0dev8: 868,969 bytes
LZPXJ 1.2a, -m3: 890,503 bytes
UHARC 0.6b, -mz: 903,649 bytes
QAZAR 0.0pre5: 911,599 bytes
LZPX 1.5b: 982,999 bytes
PKZIP 2.50, -exx: 1,017,863 bytes
15th July 2006, 11:02
Performance on 'fp.log', 20,617,071 bytes
TC 5.0dev8: 628,256 bytes
QAZAR 0.0pre5: 701,008 bytes
UHARC 0.6b, -mz: 767,290 bytes
LZPXJ 1.2a, -m3: 794,270 bytes
LZPX 1.5b: 896,371 bytes
PKZIP 2.50, -exx: 1,331,724 bytes
15th July 2006, 11:12
By the point, QAZAR 0.0pre5, without switches tuning for specific file, compresses 'world95.txt' to 643,721 bytes. Meanwhile without any options TC 5.0dev8 compresses it to 586,745 bytes.
Do not believe to SFC results too much! Since usual user will not try to find the best switches combination - it's insanity!
15th July 2006, 11:55
Also I have idea to create a GUI program based on TC compression engine. Maybe it will be TC 5.1 or brand new name. Features:
+ Three compression modes: Fast, Normal, Max
+ CRC32 checking
+ It still faster than RAR and MUCH faster than 7-Zip, but provides higher compression that ZIP. (I think, unsuccess of PIMPLE is due to its speed.)
+ Simple PIMPLE-like GUI