Results 1 to 8 of 8

Thread: Open Source Streaming API for Compression

  1. #1
    Member
    Join Date
    Sep 2014
    Location
    United States
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Open Source Streaming API for Compression

    Hi all,

    This is my first post so be gentle if I'm talking nonsense, Note... when I say
    nonsense I don't mean I'm about to tell you I can compress random data, that's
    something else entirely.

    I've been messing around with compression for a while and started with a PPM0
    model and a 64 bit arithmetic encoder. This has sort of snowballed into a
    threaded streaming API that could be used by most compressors. I expect that
    someone has already written something in this area, if so I'd love some
    pointers at previous work. I'm aware that zlib has a streaming api, the library
    I'm working on (libdach) would be generalizing this for anyone to use.

    I've started a web page on it here that lists some of the ideas for the
    library.

    http://libdach.com/

    I'd like to know if this is an exercise in folly ie it's not needed, it's
    done already, or unlikely to be used by anyone due to [insert reason here].

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    There's ZPAQ which has lots of features, is well tested and reliable. What about extending ZPAQ/ libZPAQ instead of creating (some sort of) alternative from scratch?

    I remember Guillaume Voirin put a lot of effort into streaming API and it seems he abandoned his project ( https://github.com/centaurean ).

    Probably before you start doing your own API, look at existing ones, try to extend them and ascertain whether they can be restructured for your need in reasonable amount of time. Armed with knowledge of shortcomings of existing APIs and issues the authors had to overcome you can start your own API and it will make more sense then.
    Last edited by Piotr Tarsa; 22nd September 2014 at 14:38.

  3. #3
    Member
    Join Date
    Sep 2014
    Location
    United States
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The intention is not to create an alternative to zpaq or any compressor, besides zpaq has most of what I'm talking about anyway ie it's got a streaming API using blocks and decompresses each block in parallel. I'm suggesting a library/container that a compressor writer can use to get a Streaming API with threads etc without having to write all the plumbing. I think xz is the closest one I've seen out there to being general enough for anyone to use, it's also public domain and that makes things a bit simpler. I suppose the elevator pitch is along the lines of the Apache Portable Runtime ie the Compression Portable Runtime (CPR is an unfortunate TLA and might be suggestive of what any such project might need to get started, Encoding Portable Runtime might be more fitting)

    I had a look at Density. It looks like a well written library, it took about a 1/4 sec to get enwik8 to ~53Mb and I'm not sure it used threads to do it.
    Last edited by harry; 23rd September 2014 at 08:42.

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I think implementing basic multithreading is not a problem nowadays. Newest iterations of C/ C++ have lots of concurrency facilities built in. Java has had easy threading for a long time. So I don't think yet another multithreading framework would be interesting to experimenter.

    Streaming API is needed for someone who already has completed making, tuning and testing his compression algorithm. But the algorithm would have to be relatively very good to give sense to implementing a streaming API.

    What is missing in compression scene is a good parameter optimizer. Christopher Mattern (http://encode.ru/members/15-toffer) made a genetic algorithm based optimizer, but it was rather simple thing without much features. There are lots of opportunities here. I think that e.g. optimizing parameters in PAQ could be done much faster than simply spawning one PAQ process for every (logical) CPU core. Even a GPGPU could be used here as optimizing parameters often means running the same computations but with different variables contents.

    When optimizing parameters you have to run actual computations, but the actual computation can be split to two parts: one that does not depend on the tuned constants and the rest. Computations that don't depend on the tuned constants can be cached and the rest can be parallelized.

    Eg:
    - optimizing PAQ parameters,
    - optimize only mixer coefficients,
    - outputs from models could be easily cached as they don't depend on mixer coefficients,
    - mixing is very computationally expensive but also rather low on memory pressure except a small area, so, when run in multiple instances at once, is ideal for GPGPU implementation,
    Last edited by Piotr Tarsa; 23rd September 2014 at 19:11.

  5. #5
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by harry View Post
    I'd like to know if this is an exercise in folly ie it's not needed, it's
    done already, or unlikely to be used by anyone due to [insert reason here].
    If you want to work in general-purpose compression, you have to get over worrying that no one will use your stuff.


    Quote Originally Posted by Piotr Tarsa View Post
    Probably before you start doing your own API, look at existing ones, try to extend them and ascertain whether they can be restructured for your need in reasonable amount of time. Armed with knowledge of shortcomings of existing APIs and issues the authors had to overcome you can start your own API and it will make more sense then.
    zlib's API feels just plain clunky and inelegant. It does seem to work, and that's an underrated feature. But it can't be the final word.


    Quote Originally Posted by Piotr Tarsa View Post
    I think implementing basic multithreading is not a problem nowadays. Newest iterations of C/ C++ have lots of concurrency facilities built in. Java has had easy threading for a long time. So I don't think yet another multithreading framework would be interesting to experimenter.

    Streaming API is needed for someone who already has completed making, tuning and testing his compression algorithm. But the algorithm would have to be relatively very good to give sense to implementing a streaming API.
    I think it could be worth it to create some kind of plugin-API type of thing for compression. It would introduce consistency and make compression code easier to run and benchmark.

  6. #6
    Member
    Join Date
    Sep 2014
    Location
    United States
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by nburns View Post
    If you want to work in general-purpose compression, you have to get over worrying that no one will use your stuff.
    Other than educational value to me no one using it would make it a mostly pointless exercise.

  7. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    If you want it to be used, it needs to be somehow exceptional among alternatives.
    For example offering compression+encryption at great speed, what can be now easily achieved with tANS.

  8. #8
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by harry View Post
    Other than educational value to me no one using it would make it a mostly pointless exercise.
    The problem is that there's already a huge amount of software for compression out there.

Similar Threads

  1. Silesia Open Source Compression Benchmark
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 15
    Last Post: 22nd May 2016, 15:34
  2. Why not open source?
    By nemequ in forum Data Compression
    Replies: 65
    Last Post: 25th November 2013, 23:05
  3. MCM open source
    By Mat Chartier in forum Data Compression
    Replies: 12
    Last Post: 29th August 2013, 20:22
  4. Fast, portable, open source LZH?
    By m^2 in forum Data Compression
    Replies: 25
    Last Post: 24th March 2012, 16:00
  5. Non open source Data compression Tools
    By ehsansad in forum Data Compression
    Replies: 9
    Last Post: 22nd September 2011, 00:41

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •