I'm always keeping tabs on the progress of GCC and I have done some compiles from the latest svn trunk. Since paq is an easy compile (and is Linux compatible), I decided to do a few test with gcc 4.4.0 20090208 (experimental) vs gcc4.3.2. I know paq is much faster using the hand-written assembly, but this test is about the speed of the compiler produced code.
Below are some results. I included the compile line I used, and then the test itself.
g++ paq9a.cpp -DUNIX -O2 -s -fomit-frame-pointer -ftree-switch-conversion -floop-interchange -floop-strip-mine -floop-block -o paq9p_gcc44_O2
./paq9p_gcc44_O2 a archive enwik8
2nd try, without loop optimization switches
g++ paq9a.cpp -DUNIX -O3 -s -fomit-frame-pointer -ftree-switch-conversion -floop-interchange -floop-strip-mine -floop-block -o paq9p_gcc44_O3
./paq9p_gcc44_O3 a archive enwik8
/storage/gcc/gcc_staging_loop/usr/local/bin/g++ paq9a.cpp -DUNIX -O3 -s -fomit-frame-pointer -o paq9p_gcc44_O3
./paq9p_gcc44_O3 a archive /storage/compression-test-files/enwik8
g++ paq9a.cpp -DUNIX -O3 -s -fomit-frame-pointer -o paq9p_gcc432_O3
./paq9p_gcc432_O3 a archive enwik8
g++ paq9a.cpp -DUNIX -O2 -s -fomit-frame-pointer -o paq9p_gcc432_O2
./paq9p_gcc432_O2 a archive enwik8
As you can see in the first tests I tried out the new optimization switches (-ftree-switch-conversion -floop-interchange -floop-strip-mine -floop-block) but it didn't seem to make much of a difference with paq9a. Perhaps other algorithms might get some mileage out of it. What I did find interesting however was the 5.8% performance improvement while using the O3 optimization level. I'd love to be able to try gcc 4.4 with some of my favorite compressors ( Nanozip, CCM, BCM, etc). So if any of you who would like to try it, I'm sure it'll produce some interesting results.
If you want to use the new snazzy loop optimizations, they require two external libraries when you compile gcc. http://gcc.gnu.org/wiki/Graphite_Build was very helpful to me to get it working.