Results 1 to 9 of 9

Thread: Highest LZ77 compression of enwik8

  1. #1
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    46
    Thanks
    14
    Thanked 11 Times in 7 Posts

    Question Highest LZ77 compression of enwik8

    What compressor shows highest LZ77-like compression of enwik8? What's the compression ratio? The code should not pre-analyze source data. Thanks.

  2. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by lz77 View Post
    What compressor shows highest LZ77-like compression of enwik8? What's the compression ratio? The code should not pre-analyze source data. Thanks.
    LZMA?

    What is pre-analysis here? For 5x faster decompression and more pre-analysis you would get good results with large-window brotli.

  3. #3
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    31
    Thanked 22 Times in 15 Posts
    Quark is even better than lzma compresses enwik8 to 22,988,924 but unfortunately is closed source.

  4. #4
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    CSC -m5 compresses enwik8 a bit better than LZMA (from xz-utils) at level 9. Significantly faster compression, too, though decompression is a bit slower. IIRC it's lz77 but I'm not certain.

  5. #5
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    46
    Thanks
    14
    Thanked 11 Times in 7 Posts
    > What is pre-analysis here?

    It's a good question, but it seems, answer is simple: the unpacker should ONLY copy literals from compressed data, and copy some bytes from already uncompressed data. If an optimized unpacker uncompresses enwik8 more than 0.5 sec. on 1 core, then it's not pure LZ77 compression. In packed enwik8 at the beginning of the data should appear '<mediawiki xmlns="http://www.' and other literals.

    > CSC -m5 compresses enwik8 a bit better than LZMA

    Both compressors are not pure LZ77:

    "Introduction for libcsc:

    The whole compressor was mostly inspired by LZMA, with some ideas from others or my own mixed.
    Based on LZ77 Sliding Window + Range Coder."

    > Quark is even better than lzma compresses enwik8 to 22,988,924...

    I can't find the Quark... I doubd than a LZ77 compressor can achieve ratio less than 32% (Mb) on enwik8.
    Last edited by lz77; 12th August 2017 at 10:58.

  6. #6
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    Quote Originally Posted by lz77 View Post
    I can't find the Quark... I doubd than a LZ77 compressor can achieve ratio less than 32% (Mb) on enwik8.
    Attached...

    Also, try this:
    https://encode.ru/threads/2280-LZOMA...ll=1#post46015
    Attached Files Attached Files

  7. #7
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    46
    Thanks
    14
    Thanked 11 Times in 7 Posts
    Thanks, I tried lzoma.exe, quark.exe, crush.exe (An LZ77-based file compressor by I. Muraviev: sourceforge.net/projects/crush/), and lizard32.exe by by Y.Collet & P.Skibinski).

    quark.exe drilled my brain by its interface...

    I see that only lizard is LZ77 compressor, only in liz archive I've found literals like '<mediawiki xmlns="http://www.'.

    lizard32.exe -29 --no-frame-crc -B6 enwik8 enwik8.liz
    produces enwik8.liz of size 37 203 082 bytes. But decompression is slow, ~ 3 sec....

    I hope in near future I'll compress hapless enwik8 to ~33 000 000 bytes and decompression will be very fast.

  8. #8
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Compressors like quark and plzma have high compression because of strong entropy coders.
    These compressors have preference for literal coding against matches.
    This is the reason why they are too slow at decompression and they loose some speed advantage over bwt
    In general this is not what you are expecting from a lz77 compressor.


    You can look at the Compression Benchmark or
    make your own benchmark with the Compressor Benchmark TurboBench

    see also: LTCB

    Quote Originally Posted by lz77 View Post
    I hope in near future I'll compress hapless enwik8 to ~33 000 000 bytes and decompression will be very fast.
    "lzturbo -29" is compressing enwik8 to 28,788,842 without any entropy coding

  9. #9
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    174
    Thanked 51 Times in 37 Posts
    Quote Originally Posted by lz77 View Post
    Thanks, I tried lzoma.exe, quark.exe, crush.exe (An LZ77-based file compressor by I. Muraviev: sourceforge.net/projects/crush/), and lizard32.exe by by Y.Collet & P.Skibinski).

    quark.exe drilled my brain by its interface...

    I see that only lizard is LZ77 compressor, only in liz archive I've found literals like '<mediawiki xmlns="http://www.'.

    lizard32.exe -29 --no-frame-crc -B6 enwik8 enwik8.liz
    produces enwik8.liz of size 37 203 082 bytes. But decompression is slow, ~ 3 sec....

    I hope in near future I'll compress hapless enwik8 to ~33 000 000 bytes and decompression will be very fast.
    Perhaps I'm misunderstanding what you want. I'm sure you already tried this:
    https://encode.ru/threads/550-Ultra-...ll=1#post53288

Similar Threads

  1. Some perl scripts for enwik8 parsing
    By Shelwien in forum Data Compression
    Replies: 3
    Last Post: 3rd March 2019, 07:29
  2. SSE, BMI do not accelerate LZ77 (un)compression
    By lz77 in forum Data Compression
    Replies: 12
    Last Post: 23rd June 2016, 10:24
  3. lz77 visualisation
    By chornobyl in forum Data Compression
    Replies: 3
    Last Post: 7th June 2016, 16:04
  4. XPACK - experimental compression format (LZ77+FSE)
    By Zyzzyva in forum Data Compression
    Replies: 10
    Last Post: 30th May 2016, 10:32
  5. Fast arithcoder for compression of LZ77 output
    By Bulat Ziganshin in forum Forum Archive
    Replies: 13
    Last Post: 15th April 2007, 17:40

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •