Results 1 to 8 of 8

Thread: Superduper - Precompressor

  1. #1
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts

    Superduper - Precompressor

    Hi guys, here's my first actual compressor, my goal was to make a faster and more memory efficient utility like srep, superduper is just short for "super deduplication" (it sounded snazzy). Its performance is pretty good, with encoding generally around 200 MB/s and decoding near 2 GB/s on an overclocked pentium g3258.

    https://github.com/loxxous/Superduper

    Let me know if you encounter any bugs.
    Last edited by Lucas; 22nd September 2016 at 19:03.

  2. The Following 2 Users Say Thank You to Lucas For This Useful Post:

    Minimum (20th September 2016),SolidComp (20th September 2016)

  3. #2
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    I admire anyone who writes their own compressor. Could you elaborate on what it does or what workloads/content it's geared toward? I've struggled to understand what precompressors actually do, and how they fit into a typical compression workflow. I assume there's no such thing as a universal precompressor, so Superduper is pre.. what? What happens next after using Superduper to precompress something – what sort of compression would follow?

  4. #3
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Quote Originally Posted by SolidComp View Post
    I admire anyone who writes their own compressor. Could you elaborate on what it does or what workloads/content it's geared toward? I've struggled to understand what precompressors actually do, and how they fit into a typical compression workflow. I assume there's no such thing as a universal precompressor, so Superduper is pre.. what? What happens next after using Superduper to precompress something – what sort of compression would follow?
    These kinds of preprocessors are generally geared towards statistical context models, the point of SD is to remove long range redundancy fast as possible, my intent with this is to have it used as a precompression scheme to a contex model. the idea is simple: less data to model, the faster the compression. on enwik9 it can reduce the 1GB file by 17%, that's 17% less data a context model needs to read to achieve pretty much the same compression ratio. It's about finding ways on saving time on the most complex routine.
    Also worth noting that lz77 is heavily asymmetric, decoding has very little overhead and will almost always speed up decompression routines.

  5. #4
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Compile by Krinkels.org
    Attached Files Attached Files

  6. #5
    Member Zonder's Avatar
    Join Date
    May 2008
    Location
    Home
    Posts
    55
    Thanks
    20
    Thanked 6 Times in 5 Posts
    Atm it looks like some Jokeware. v0.02 tested:

    duped1.tar 5757Mb:
    1440Mb srep+lzma2:d16mb
    2054Mb lzma2:d16mb
    2149Mb superduper+lzma2:d16mb

    3666Mb srep
    4511Mb superduper

    duped2.tar (2 same uncompressible files tared) 4686Mb:
    2341Mb srep
    4657Mb superduper

  7. The Following User Says Thank You to Zonder For This Useful Post:

    schnaader (22nd September 2016)

  8. #6
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Quote Originally Posted by Zonder View Post
    Atm it looks like some Jokeware. v0.02 tested:

    duped1.tar 5757Mb:
    1440Mb srep+lzma2:d16mb
    2054Mb lzma2:d16mb
    2149Mb superduper+lzma2:d16mb

    3666Mb srep
    4511Mb superduper

    duped2.tar (2 same uncompressible files tared) 4686Mb:
    2341Mb srep
    4657Mb superduper
    I'm surprised it even compressed that second test file of yours.
    Right now it uses a 2 MB window and a small hash (hence the low memory usage). I'm still working on it. The most it can theoretically support are 256 MB distances, I'll try to get that working tonight.

    Update: new release up on github, now with user selectable block size and memory for the hash.
    Last edited by Lucas; 22nd September 2016 at 22:00. Reason: Update

  9. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    excuse my naive understanding. but if it only compress "identical parts" over a 256mb window. is it really a deduplikation ?
    I thoguht the deduplication was meant for when it works across huge data size. this is even less than 7-zip

  10. #8
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Deduplication is kinda broad, it's just removal of identical blocks, which is just lz77, that was the original name to this project but it wasn't unique so I changed it. This is actually a sub model to a project I'm working on, the main compression model I use can't see identical strings because it can only make predictions rather than directly encode strings, so on some files it benefits to use lz77 preprocessing. I release my work in parts for people to play around with because that's how I learned how different algorithms perform in different orders/stages. Like with raw bwt transformations follwed by fiddling around with second stage algorithms to try and find ways to improve performance or compression ratio.

Similar Threads

  1. I need help with choosing a good precompressor.
    By miyamoto in forum Data Compression
    Replies: 5
    Last Post: 23rd October 2011, 00:31
  2. Precomp precompressor
    By LovePimple in forum Forum Archive
    Replies: 1
    Last Post: 2nd November 2006, 13:08

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •