Thanks Simon!
Compiled...
EDIT: Attachments "paq8q_v12.zip" and "paq8q_v12_sse2.zip" removed.
Thanks Simon!
Compiled...
EDIT: Attachments "paq8q_v12.zip" and "paq8q_v12_sse2.zip" removed.
Hi,
I think is something not proper in some cases of files in paq8qv11.
My wav testfiles or files contains wav files are compressed worse than in v10. Examples:
file: 0.wav, 8 bit mono wave, original 2762044
paq8qv10 - 1332422
paq8qv11 - 1332436 - small difference but...
file: l.pak, container with different kond of data, contains about 90 wav files - mostly 8 bit mono, original 7764079
paq8qv10 - 2696259
paq8qv11 - 2818825 - 4.5% of difference
Darek
Are both generating a bit identical output file?
There was wrong block size written for image/audio blocks.
paq8q_v13
- fixed wrong block size
- some changes in comparing
- percentage for decompression/comparing
Last edited by Jan Ondrus; 8th June 2009 at 21:22.
Thanks Jan!
Compiled...
Hi!
thanks for fixing this!
now is OK.
I have question about exe filter used in this version (I've tried to ask on paq8px forum, but nobody asnwer on this question): Should used exe algorithm (filter) recognise any kind of exe files or only selected types or exe/dll headers or only some special cases?
For my testbed files 3 of them (of 4) aren't recognised as by exe filter even in a few part, and are compressed whole as a default, which is less effective.
Darek
Headers are not used for exe detection. Actually it detects x86 code - it looks for JMP (0xe, CALL (0xe9) and 0x0f80..0x0f8f (conditional jumps) instructions and tries to guess if relative address for jump (next 4 bytes) will be present in file more often if converted to absolute.
I don't know why your 3 test files aren't detected. Aren't they compressed with some executable packer (UPX)?
Jan ondrus wrote:
Aren't they compressed with some executable packer (UPX)?
Thanks for answer. I don't know if my files are internally compressed. It's possible. These files are original executables of old applications.
I post one example.
Darek
Thanks.
Is this a general idea of exe compressing (for converting only 32 bit addresses) or paq-exe used filter option?
Darek
It's a pity that paq8q won't be used tested (at the moment).. There is a general bug in all modes except -m6. It is fixed for the next version.
Last edited by Simon Berger; 10th June 2009 at 15:58.
Here is an to XML/HTML compression. While the compression ratio improvement is only small the time improvement should be noticeable.
It's an element precompression and works for all elements with and without keys/parameters.
Are working:
Leaved untouched (no problems there only no improvement)Code:<id>9</id> <id key="0">9</id> <id key="1" />
So like you see it precompresses only well formatted elements.Code:<id >9<id> <id key=0>8</id> <id key="1"/>
Benchmark on enwik8:
No timings here. Should be done later with a fast and the same builds.Code:original: 100,000,000 bytes precompressed: 98,125,260 bytes paq8q_v13: 17,733,056 bytes // without the precompression sure paq8q_v14_beta: 17,718,092 bytes // inklusive precompression
I have to clean the code before releasing and make it more robust against malformed elements.
If someone outside paq is interested in this I could create a general class. It shouldn't break any context and should work well for LZ-based compressors too. Would be interesting to compare it to other XML pre-processors too. But I didn't stumbled about something similar yet
EDIT:
Btw. Thanks for your additional informations/ corrections to the supported formats Jan. I added them for the next release. I absolutely wasn't sure on the P(X)M part since now. Missed your post somehow previously..
Last edited by Simon Berger; 20th June 2009 at 20:32.
I found a very nice project that automatically generates valid XML files http://www.xml-benchmark.org/.
With this I now tested a 30mb "real XML file". Real means XML in use as a database with short- and only some longer values. The results are much better.
This generator created tags like
Which I didn't understand. But I decided to support this besides the one with a space.Code:<id key="1"/>
Benchmark:
I used this command to generate the file:Code:original: 35,702,033 bytes precompressed: 28,145,890 bytes paq8px_v60: 4,736,835 bytes paq8q_v14_test: 4,205,636 bytes (Decompressed output was verified to be bit identical)
Code:win32.exe /f 0.3 /o outfile.xml
Wow.Code:paq8px_v60: 4,736,835 bytes paq8q_v14_test: 4,205,636 bytes
That's a lot.
For those looking forward to XML/Text improvements. I only work some minutes per day on this. One problem was a proper XML tag parsing/validation without using many lines of code.
I have tested some XML compressors and preprocessors but all didn't show great results.
Somehow I missed the well-known XML-WRT which owned my precompressor hard. The biggest reason is a dynamic dictionary which also works on normal text files.
Here the results of WRT + paq8px_v60 on both filesI tested in the previous posts.
I have added some ideas to my implementation but it was still a too long way to WRT. Because this dynamic dict reduces filesize HEAVILY I decided to include WRT or a selfmade version into paq. Compression of enwik8 should be reduced by ~80%. I have no exact preprocessed file size but I think to remember it was under 50mb (~44mb).Code:Enwik8: wrt+paq8px_v60: 17,372,979 bytes outfile.xml: paq8px_v60: 4,736,835 bytes paq8q_v14_test: 4,205,636 bytes wrt+paq8px_v60: 3,640,092 bytes // !!!!!!!!!!!!!!!!
@Simon:
Are you plan to release some builds of paq8q_v14_test for testing purposes?
Darek
Yes I will do after I included WRT. WRT is a really massive addition with half the lines of code of the whole paq project. But I am going to delete all code not needed for paq and change the style of coding.
At the end paq8q_v14 will have two source files. It is no longer possible in my opinion to let paq source size grow and grow. On the other hand if someone wants one file he still will get one file by easily left this text preprocession out.
I can't create a XML preprocession if I know there is something this much better out also if it is such a bigger thing. It will make PAQ compression much faster and I get another idea which is a similar to LZP preprocession (paq9) but shouldn't have the bad side effect of compression decrease while still being much faster.
Thanks Simon.
Then I'll wait for finishing the works and release, and of course I'll trace the building progress.
Regards, Darek
I currently don't work on paq8 and the things I want to do for paq8q, but I fixed some serious problems some time ago I wanted to put in the XML precompression update that will come at a later time.
I hope that this is a stable release and going to be tested in some benchmarks (maximumcompression...).
It's on feature standard of paq8px_v60 too.
Changelog
Code:Fixed a bug that didn't let mode 2-5 work at all Fixed a bug in comparing Changed and Fixed small appearance things
Thanks Simon!
Compiled...
bug: Comparing shows "Files are equal. No difference found." message when compressed file is shorter (file for comparing is longer).
paq8q:
paq8px_v60:Code:if (mode==FCOMPARE && !diffFound) printf("Files are equal. No difference found.\n"); else if (mode==FCOMPARE) printf("First difference found at file offset: %u\n", diffFound-1); else printf("done \n");
Code:if (mode==FCOMPARE && !r && getc(f)!=EOF) printf("file is longer\n"); else if (mode==FCOMPARE && r) printf("differ at %d\n",r-1); else if (mode==FCOMPARE) printf("identical\n"); else printf("done \n");
Last edited by Jan Ondrus; 11th July 2009 at 14:44.
Simon, LP, thank you! Here is the test for paq8q_14 which brings unpleasant news. Maybe bug pointed by Jan causes it
Tested on PAQ_TestBed.tar, -6 -m6 level.
All output files are different !Code:compile time size CRC --------------------------------------------------------------- paq8q 1134.122 5 350 117 E2962AC8 paq8q_speed_optimised 1096.594 5 350 124 0698B712 paq8q_sse2_amd 1138.415 5 350 117 02E0A874 paq8q_sse2_intel 1122.463 5 350 117 6A6437FC
Last edited by Skymmer; 12th July 2009 at 15:39.
Bug pointed out by Jan is about proper signalizing of the differences, not the differences themselves.
I am... Black_Fox... my discontinued benchmark
"No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates
Thank you Jan.
Thank you Skymmer too. Maybe the wavModel isn't up to date. I will look into it and release an update.
I've tracked down the problematic file in PAQ_TestBed.tar file. It's Sine_generator_65Hz.wav
All others being compressed separately give identical output for all compiles. Strange that only this file is problematic. All other 3 WAVs and 4 AIFF are packed normally.
Didn't you find out that it is exactly the same for paq8px?
No, I didn't. Here are the checksums for Sine_generator_65Hz.wav at -6\-6 -m6 level.
Code:paq8q_no_opt 78645217 paq8q_sse2_no_opt 78645217 paq8q_speed_optimised 78645217 paq8q BFA453E3 paq8q_sse2_amd BFA453E3 paq8q_sse2_intel BFA453E3Code:paq8px_no_opt 0C5B87B8 paq8px_sse2_no_opt 0C5B87B8 paq8px 31A36660 paq8px_speed_optimised 31A36660 paq8px_sse2_amd 31A36660 paq8px_sse2_intel 31A36660 paq8px_fast_wav 31A36660 paq8px_spopt_fast_wav 31A36660 paq8px_fastpaq2 31A36660 paq8px_fastpaq2_so 31A36660 paq8px_turbo 31A36660 paq8px_v60_AMD 42C47800 paq8px_v60_Intel_SSE2 42C47800 paq8px_v60_MMX F91880B7
I think you should not use DOUBLE in a file compressor!
Any floating point arithmetic is dangerous, because even (a+b)+c is not the same as a+(b+c), so it's very likely to lose binary compatibility
when compiling with different optimization options.
better use fixed-point, like train and dot_product!