Results 1 to 16 of 16

Thread: Compression techniques that don't require repetition

  1. #1
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Compression techniques that don't require repetition

    So, the compression algorithm I've been working on doesn't require ANY repetition to be effective. Currently, it's crunching runs of 1,024 bytes down to a 20-byte representation, and that's really squeezing everything out of it. I could stand to raise the representation up to even 512 bytes and still be getting great performance out of it. On a 1GB "random data" file, it compressed at 549 Mbps in 18.7 seconds at 98.24% compression. The compressed file is ~18.5kb now. The decompressor is already coded, but I am working out bugs. I lose a few bytes on decompression, so it's close, but not 100%.

    I will say in advance, I understand any doubt. I think it's healthy to be skeptical, especially considering this would break some rules of compression.

    All this to ask, are there any current compression algorithms that don't rely on repetition or patterns?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > so it's close, but not 100%.

    Its not a problem, you can always just use a binary diff tool: https://github.com/sisong/HDiffPatch/releases
    (diff correct file with partially decoded file, then count diff size as part of compressed size).
    or alternatively ECC: https://en.wikipedia.org/wiki/Parchive

    > are there any current compression algorithms that don't rely on repetition or patterns?

    Depends on what you mean by "repetitions".
    Entropy coding algorithms don't rely on repeated long strings, but do rely on statistics, ie patterns in symbol repetitions.

    > break some rules of compression.

    Its not about "rules" or anything.
    Its more about ability to count the numbers of possible input files and compressed files.

  3. #3
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks! I'll check out the binary diff tool. It will save me loads of time trying to find the error.

    As far as repetition, I meant patterns or frequencies. My algorithm uses arithmetic but no probabilities or anything.

    Did you mean count how many files can be compressed to the same compressed sequence?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Did you mean count how many files can be compressed to the same compressed sequence?

    Its obvious that its impossible to decompress different data from the same "compressed sequence".

    > As far as repetition, I meant patterns or frequencies.

    Its possible to replace the entropy coding with hashing, but in the end, "patterns or frequencies" are the only sources of compressible redundancy.
    https://en.wikipedia.org/wiki/Kolmogorov_complexity

  5. #5
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ah, I see. It says there is a lower limit for the size of the compressor able to compress a file, if I'm reading that right. The compressor and decompressor are in the same file and it's about 15kb.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Yes, its relevant for contests like this: http://prize.hutter1.net/

  7. #7
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Yes, its relevant for contests like this: http://prize.hutter1.net/
    And I wouldn't mind doing the challenge, but I don't want to give away the source code.

  8. #8
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Quote Originally Posted by Bundle View Post
    And I wouldn't mind doing the challenge, but I don't want to give away the source code.
    You are not required to source code compressor
    Only decompressor.
    Or:
    Well then https://marknelson.us/posts/2012/10/...turns-ten.html
    That I need to give the source code.

  9. #9
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by xinix View Post
    You are not required to source code compressor
    Only decompressor.
    Or:
    Well then https://marknelson.us/posts/2012/10/...turns-ten.html
    That I need to give the source code.
    But giving the decompressor source code pretty much gives away the method. I haven't decided what I want to do with the algorithm yet.

  10. #10
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Yes, its relevant for contests like this: http://prize.hutter1.net/
    I looked at the Hutter Challenge, and I was able to compress the 95.3MB enwik8 file down to 2.03MB.

  11. #11
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by Bundle View Post
    2.03MB.
    2,128,609 bytes?

    Can be $43,854 price when your output file and your decompressor file lead to identical enwik8 (and a lot of jobless people in this forum).

  12. The Following User Says Thank You to Sportman For This Useful Post:

    snowcat (17th June 2019)

  13. #12
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    Quote Originally Posted by Bundle View Post
    I looked at the Hutter Challenge, and I was able to compress the 95.3MB enwik8 file down to 2.03MB.
    Nope.

  14. #13
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    You can create a version only working at 1MB files, with only 3 bytes or more compression gain and submit it in this thread to claim my reward:
    https://encode.ru/threads/1176-losel...ll=1#post59199

  15. #14
    Member
    Join Date
    Jun 2019
    Location
    United States
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by hexagone View Post
    Nope.
    That's not a very detailed response.

  16. #15
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    Quote Originally Posted by Bundle View Post
    That's not a very detailed response.
    It was not a response, since you made a statement. but, ok, prove me wrong.

  17. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Compressing any files to any size is not a problem.
    Correctly decompressing them is, though.
    It looks like another case of misunderstanding how hashing works, based on first post.

    Also, as to Hutter Prize - I'm pretty sure that they would accept a 2M entry even without sources.
    Its a new rule anyway, it didn't exist before, and nobody submitted sources yet.

Similar Threads

  1. Forum BUG: Posts don't have newlines when using NoScript
    By m^2 in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 24th January 2012, 21:18
  2. Little question about Suffix Sorting techniques
    By Piotr Tarsa in forum Data Compression
    Replies: 0
    Last Post: 23rd May 2011, 20:01
  3. Idea - Don't flame me pls...
    By rubendodge in forum Data Compression
    Replies: 7
    Last Post: 24th April 2009, 19:44
  4. Modelling techniques
    By Shelwien in forum Data Compression
    Replies: 14
    Last Post: 1st June 2008, 23:15
  5. Don Lewis
    By encode in forum Forum Archive
    Replies: 1
    Last Post: 17th October 2006, 13:31

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •