ENWIK13 is 5TB text archive - 10^(12,7) bytes
31GB in 7z compression
http://download.wikimedia.org/enwiki...history.xml.7z
update to "Large Text Compression Benchmark" ...![]()
ENWIK13 is 5TB text archive - 10^(12,7) bytes
31GB in 7z compression
http://download.wikimedia.org/enwiki...history.xml.7z
update to "Large Text Compression Benchmark" ...![]()
And 280 GB with bzip2. So now we have 2 numbers for the benchmark. I suppose the huge compression ratio for 7zip (160 to 1) is because the set has every version of every page with only minor differences between versions. 7zip has a much bigger buffer than bzip2 (900 KB max).
But I think I'll stick with 1 GB. Testing takes long enough as it is.![]()