Some compression results for pi.txt from the Canterbury miscellaneous corpus.
pi.txt contains the first million digits of pi: 3141592653...45815
The ZPAQ configuration file pi.cfg is as follows:
470,439 gzip -9
416,101 paq8px_v69 -8
159 zpaq ocpi
The preprocessor discard.bat takes an input and output file as arguments and creates an empty output file:
comp 0 0 0 19 1
0 icm 5
pcomp discard ;
a> 255 ifnot halt endif (no output unless EOF)
(Compute pi to 1000000 digits using the formula:
pi=4; for (d=r1*20/3;d>0;--d) pi=pi*d/(2*d+1)+2;
where r1 is the number of base 100 digits in M.
The precision is 1 bit per iteration so 20/3
is slightly more than the log2(100) we need. We compute
10 extra trailing digits and discard them to avoid
a= 100 a*= 100 a*= 50 a+= 10 r=a 1 (r1 = digits base 100)
a*= 20 a/= 3 d=a (d iterations)
*b= 40 (pi = 4)
(multiply M *= d, carry in c)
b=r 1 c=0
a=*b a*=d a+=c c=a a%= 100 *b=a
a=c a/= 100 c=a
a=b a> 0 while
(divide M /= (2d+1), remainder in c)
a=d a+=d a++ d=a
a=c a*= 100 a+=*b c=a a/=d *b=a
a=c a%=d c=a
a=r 1 b++ a>b while
a=d a>>= 1 d=a
b= 0 a= 20 a+=*b *b=a
d-- a=d a> 0 while
(output the digits in ASCII, 2 per byte)
a=r 1 a-= 10 r=a 1 (discard last 10 bytes)
a=*b a/= 10 a+= 48 out
a=*b a%= 10 a+= 48 out
b++ a=r 1 a>b while
The postprocessor ignores the decoded output and computes pi to 1,000,000 decimal places after decoding EOF. Thus, pi.cfg works only on this one particular file. The code itself is compressed with an order 0 indirect model. It would be possible to reduce the compressed size further from 159 bytes to 126 by not storing the filename, comment, and checksum using the command "zpaq nisocpi pi.zpaq pi.txt".
The algorithm is not the most efficient. It took 31868 seconds (about 9 hours) to compress and 27943 seconds to decompress on a 2.67 GHz i7 M620 under 64 bit Linux (using an equivalent shell script for discard "cp /dev/null $2"). Compression is slow because zpaq tests the pre-post processing sequence before encoding, which requires generating the SHA1 hash of pi and comparing it with the SHA1 hash of the input. It would have been 5 times slower without optimization using the "o" modifier (translating the above ZPAQL to C++, compiling, and rerunning zpaq) or decompressing with a non-optimizing program like pzpaq.
I realize there are faster algorithms for computing pi. For example, qpi computes a million digits in about 1 second using (I think) Chudnovsky's formula with binary splitting, FFT multiplication, and Newton-Raphson division. But then the archive would have been larger.