Lossless compression program with various algorithms included. Intended for large XML files, but can compress any files. Best results with block compression (BWT+MTF+RLE) on large block over alphabet of words.
The aim of the project is to create software that would serve as platform for test-
ing various compression methods and their applications. From practical point
of view we focused on compression of XML les. The most important factor
is the compression ratio and therefore the best known approaches and methods
were used. We combined methods for XML compression and text compression.
For compression of large texts, words are often taken as symbols of alphabet
and dictionary of used words is created. As proposed by L?nsk?, we split words
into syllables and work with them. We get signicantly smaller alphabet com-
pared with word approach, resulting in smaller dictionary, while maintaining
the compression ratio. On the output of the parser we have tried several stan-
dard methods: block compression, dictionary and statistical methods. Variant
with syllable compression together with block compression (Burrows-Wheeler
transform + move to front + run-length encoding) has been previously tested
by L?nsk?, but the implementation has suered from very low performance.
We tried also combination of methods not presented before, such as use of block
compression followed by statistical methods. In this project we deal with lossless
Major loseless compression algorithms library and documentation. First project: Arithmetic, Huffman, LZ77, LZ78, LZW, RLE. Second project reimplements Deflate. Documentation explains major Entropy Compression Methods.