Results 1 to 13 of 13

Thread: New LTCB record

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    New LTCB record

    Dmitry Shkarin broke his own record on LTCB, improving compression from .1280 to .1277 with a new dictionary for durilca'kingsize. It is also faster than the next 13 compressors. I did not test it myself because it requires 64 bit Windows and 13 GB memory.

    http://mattmahoney.net/dc/text.html#1277

  2. #2
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Congratulatation Dmitry!
    Nice work!

  3. #3
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It is designed to work only on this benchmark and not in general.
    No kidding??!?!?!?!?!?!


  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The top 3 on LTCB are all tuned to the benchmark. That's going to happen with any public benchmark and I expected it. There are 3 ways (that I know of) to benchmark, but none of them really answer the question "what is the best compressor?" Otherwise zip would be top ranked because that's what everyone uses.

    1. Use private test data.
    2. Use public data and add the decompresser size.
    3. Use a cryptographic random data generator (like generic benchmark).

    I don't like 1 because nobody else can submit or check results. It is more work for the evaluator, but maybe you can automate it like sportsman's metacompressor site. Even so, when the evaluator gets tired of it, the benchmark dies.

    2 has problems like tuning, but it does eliminate tricks like BARF that compress the Calgary corpus to 1 byte, or less blatant like dictionary from world95.txt, special filter for rafale.bmp, etc. You can't compress smaller than the (unknown) Kolmogorov complexity, and it it well tested (Calgary challenge).

    3 solves the problems of (1) without decompresser size, but only gives an approximate measurement. But creating truly generic data is a hard theoretical problem. The distribution is very sensitive to the choice of programming language for the random programs that generate the test data.

  5. #5
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The top 3 on LTCB are all tuned to the benchmark. That's going to happen with any public benchmark and I expected it.
    Sorry, I think you missed the joke. I thought it was extremely obvious and redundant that a 64-bit 13 GB compressor wasn't general use.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yeah I saw the smilies. I'm waiting for someone to run a 10,000 model version of PAQ on a Jaguar XT5 with 224,000 cores and 299 TB of memory. http://www.nccs.gov/jaguar/

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Cores?.. Is there a parallel paq version?
    Also, what kind of models? Something like paq8k?

  8. #8
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Rugxulo View Post
    Sorry, I think you missed the joke. I thought it was extremely obvious and redundant that a 64-bit 13 GB compressor wasn't general use.
    Actually, I joke, but now I've seen the Black Friday ads with several computers sporting anywhere from 1 GB RAM (netbooks) to 3 GB laptops to one desktop with 8 GB, and actually most of them run 64-bit Win7, even with only 3 GB of RAM. So I guess it's not that far-fetched (if still extremely silly to my mind).

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yes, in a few years it will be common for computers to have 16 GB. LTCB has no hardware limits for that reason. Anyone can submit results run on any computer.

    An approach like paq8k could be made fast on a parallel computer. All of the models can be executed in parallel. They have to be mixed but that can be done using a mixer tree in O(log n) time with all mixers at the same level executed in parallel. Then the SSE and arithmetic coding are serial but the SSE can be moved down the tree like in PAQAR to be done in parallel.

    For text compression there are probably better algorithms that are even more parallel. Remember that the model is doing the same thing your brain is doing when you guess what word will be next in some text. It does this with 10^11 processors each with 1 KB memory running at 300 Hz.

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Matt Mahoney View Post
    10^11 processors each with 1 KB memory running at 300 Hz.
    Thanks Matt, I never looked at it this way.
    I think I'll go through complexity theory again and see how it works.

  11. #11
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I can compress the compressed file to 127,377,411-->127,377,288

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    It only counts if your decompresser is smaller than 123 bytes.

  13. #13
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    It only counts if your decompresser is smaller than 123 bytes.
    60K,

Similar Threads

  1. New LTCB record
    By Matt Mahoney in forum Data Compression
    Replies: 20
    Last Post: 13th August 2009, 00:06
  2. New LTCB champion
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 23rd May 2008, 17:33
  3. ASH04a and LTCB
    By Shelwien in forum Forum Archive
    Replies: 10
    Last Post: 28th February 2008, 15:04

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •