Results 1 to 7 of 7

Thread: How parallelizable are PAQ algorithms ?

  1. #1
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts

    How parallelizable are PAQ algorithms ?

    As a way of learning how PAQ works, I was thinking of implementing on massively parallel
    GPU. Also, to see what sort of speedup is possible. But, if the algorithm is very serial, this may not
    be worthwhile. I have found that even fairly serial coders such as MQ coder can be significantly
    sped up on GPU, with the right amount of cunning.

  2. #2
    Member
    Join Date
    Nov 2014
    Location
    Moscow
    Posts
    3
    Thanks
    24
    Thanked 0 Times in 0 Posts
    4x4:b256m:paq :D

  3. #3
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Since paq uses many models you can easily parallelize the update cycles, so once you know the current bit each model can update on it's own thread, but it'd still need to use some amount of thread scheduling. Mixing can sometimes be done out-of-order given how many weights you're using and how you mix it. Something like p * (W0 + W1 + ... Wn-1) / n is able to add all the weights in any order since it all sums to the same value, though this would be a bad mixer it still would be able to utilize out-of-order execution. I haven't done any parallel CM at all yet but I have been thinking about it, I don't know of any that currently exist so it'd be nice to see what can be made.
    Last edited by Lucas; 16th May 2017 at 01:15.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Actually the best way to parallelize CM encoding is BWT-like - ie sort counters by context, compute all probability values, then do the same with mixing/SSE layers.

    The main problem is that this method is not applicable to decoding - it fact, decoder for a parallel encoder could become even slower than it is now.

    Some tricks could be applied to optimize decoding speed too, I guess - like replacing some of mixing with switching + parsing optimization,
    speculative decoding, etc. But its much harder to parallelize decoding for CM, overall, because normally each decoded bit would change next predictions,
    so we have to know all previous bits to decode the next one. While during encoding, all bits are simply already available.

  5. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    boxerab (19th May 2017),Bulat Ziganshin (16th May 2017)

  6. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by Lucas View Post
    Since paq uses many models you can easily parallelize the update cycles, so once you know the current bit each model can update on it's own thread, but it'd still need to use some amount of thread scheduling.
    I think threading is challenging as it would need very fine scheduling, however I like the idea of updating multiple models simultaneously. Perhaps for related models some of this can be done in a SIMD fashion, which is still a form of parallelism (in-core). I don't know enough about the different models though to know how much shared maths there is.

  7. #6
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Now that I think about it, you can properly multi-thread sparse context models if you just load the n'th thread's probability into i%nth position in a probability buffer. Because each model only looks at context with gaps of a certain size if you were to use a context with a gap of 1, we would have room to fit a second sparse model running along side and just slotting in the probabilities into the buffer but at a fixed offset (one model would only look at even positions and the other would model odd positions).

    To take this idea even further you could perform a fixed reorder of the input data and feed it through parallel models, feeding their output into a buffer at certain offsets for each thread and topping it off with periodic range coding of the buffer. It would sacrifice some compression and increase memory usage (1N blocksize + 32kb buffer + models*threads) but it would allow for proper multi-threading without scheduling on every bit, it's pretty much a fancy way to do blockwise CM with a contiguous output.

    To increase compression you could (in theory) periodically mix the content of each thread's models. This would allow for the models to inherit the data seen in other threads and should help keep compression high.

  8. #7
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Thanks, everyone. So, it doesn't sound that promising for running on GPU. GPU memory has very high bandwidth,
    but also high latency - the way to hid latency is to run multiple threads on the same task.
    Still worth looking into, though.

Similar Threads

  1. Sorting Algorithms Benchmark
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 39
    Last Post: 1st June 2015, 09:38
  2. Papers - Summary of Data Compression Algorithms
    By Gonzalo in forum The Off-Topic Lounge
    Replies: 12
    Last Post: 10th February 2015, 08:24
  3. LIBSSC : SHARC's algorithms unleashed
    By gpnuma in forum Data Compression
    Replies: 131
    Last Post: 18th April 2014, 21:19
  4. identity of obscure algorithms
    By asmodean in forum Data Compression
    Replies: 2
    Last Post: 6th August 2009, 08:50
  5. Searching fast decompressable algorithms
    By Mimos in forum Data Compression
    Replies: 8
    Last Post: 24th July 2008, 23:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •