Results 1 to 16 of 16

Thread: PAQ8o8 threading observations

  1. #1
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    when i tried applying paq8o8 on a single file split into n parts, where n was the number of concurrent instances of paq8o8 permitted on my computer(limited by combination of ram and level of compression).

    Ofcourse, as expected the compression was less when the file was split and "paq8o8'ed". but speed was "very very very noticeably improved". This is just a small test without any preprocessing/any other common optimization. In my opinion, surely the speed can be very much increased....all this..while maintaining respectable compression.

    This is my test result.. Lots of inefficiency is there... i hand "signaled" the start of each instance..so concurrency couldnt be perfectly synchronised...so the correct time will be a few seconds less than the got time.

    If anybody thinks there is anything to this... i will investigate more.... you all possibly know better...



    source tar = 1443840bytes;C++,png,glibc headers in a tar ;


    # of concurrent instances : program param : details


    1 paq8o8_64 -7; took 162.53 sec; compressed to 107704;

    2 paq8o8_64 -7; took 92.43 sec for same source; compressed to 114947;

    3 paq8o8_64 -7; 2 on each cpu and 1 common; took 61 seconds(time of last finished thread); same size as above for source; compression = 116461

    1 paq8o8_64 -6; same as "1"; took 159.25 sec; same as above for source; compression=107792

    2 paq8o8_64 -6; same as "2"; took 87.12 sec(time of last thread finished); compression = 114970

    3 paq8o8_64 -6; same as above; took 59 seconds for the last thread; compression = 116500

    4 paq8o8_64 -6; 2 on one core and the rest 2 on the other core; took 45.76 seconds (time of last finished thread); same source; compression = 122661

    5 paq8o8_64 -6; just like "3" above; took 36.16 sec (time of last finished thread); same source; compression =123875

    6 paq8o8_64 -6; just like "4"; took 30.10 sec(time of last finished thread); same source; compression =128504

    8 paq8o8_64 -5; just like "6"; took 22.33 seconds(time of last finished thread); same source;compression=138630

    1 paq8o8_64 -1; just like "1"; took 22.09 sec(time of last finished threas);same source; compression=126357

    5 paq8o8_64 -1; """; time took 5.18 sec(time of last finished thread);""; compression=141193






    cheers,
    joel.
    P.S : If paq8o8 could show this...most others will get drastically "boosted" ... but already threaded models would benifit very less(if at all).

  2. #2
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i forgot i thing.... to mention :

    my spec :

    core2duo 1.5ghz
    2gb ram
    ubuntu 7.10 64 bit

  3. #3
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    I also thought about this variant of multi-threading, but after some
    tests similar to yours I discarded this. Why?

    Quote Originally Posted by CodeMutant
    1 paq8o8_64 -6; same as "1"; took 159.25 sec; same as above for source; compression=107792
    8 paq8o8_64 -5; just like "6"; took 22.33 seconds(time of last finished thread); same source;compression=138630
    This looks nice at a first glance because you need 1/8th of
    the time and the file gets only 28,6% bigger. But this is much
    too large! If you have a look at MFC at maximumcompression.com,
    for example, paq8o8 size is 64 MB, 28,6% more would be around
    82 MB. 1/8th of the time would be 5457 seconds, but if you have
    a look at the list, there are many compressors that compress MFC
    to under 82 MB in less than 5457 seconds (for example
    Durilca - 72 MB, 1997 s; or CCMx 1.30a - 77 MB, 369 s).

    Nevertheless, I agree that PAQ should get threaded to improve
    speed. Perhaps the size could be pushed closer to the single-
    threaded size with some communication between threads, but
    this is completely speculative...

    Another idea is about the file splitting. Most of the loss in size
    should be the fact that similar data types (for example, c++
    code or text) are splitted along different files. Larger filesets
    can be splitted in a way so that different data types are
    grouped together, so you have one part with texts, one part
    with bitmap data and so on.
    Perhaps PAQ could do some 2-pass algorithm: One pass to
    quickly determine which models will be used in different file
    regions, then splitting the file to group similar model usage
    and finally compressing it.
    http://schnaader.info
    Damn kids. They're all alike.

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    compression ratio penalty will shrink as file size grows. so for mfc i expect 2- 3 % penalty.

  5. #5
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by donkey7
    compression ratio penalty will shrink as file size grows. so for mfc i expect 2- 3 % penalty.
    I hope youre right, would be really nice to see PAQ
    getting 8 times faster

    By the way, is this some kind of hyper-threading that
    allows 8 threads to scale almost linearly across only
    2 CPUs? Quite impressive!
    http://schnaader.info
    Damn kids. They're all alike.

  6. #6
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    for most part, i want to stick to "commonly available" linux and microsoft compatible tools. i dont want to play around with my important backups. Not that i trust these "common" tools any better, its just the simple case of "availablility"....From a security perspective Availability is as important as Confidentiality and Integrity..(CIA-triad)... anyway..not to bore you with these stuffs...

    Even i was amazed by the results i got.. I did expect a positive note...but the magnitude i had an idea(or thought i did) was nothing like this.


    May be this will put things in perspective :

    Agreed Those charts have better program.....but..i took the lowest deminator...check the P.S. note i left in my first post.

    PAQ8o8 is a well deserving compressor...it needs these speed pickups soon... hopefully the "developer" will "think" about multithreading/multitasking ..now that he wants speed in paq9


    I used the following linux command to set processor affinity :

    bash# taskset -c 0 paq8o8_64 -8 ......
    bash# taskset -c 1 paq8o8_64 -8 .....

    Having a look at the current state of cpu(system monitor) when the compressor runs as usual...will give you a rough indicate whether it has been already threaded.

    imho, all "sequential" ie not multithreaded implementations.. can surely benifit from this.....since OS provides this function externally..there will be no "overhead" for existing code....after you have "tuned" and come to the right balance...it(multithreading) can be implemented in compressor's source code.

  7. #7
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    AFAIK Linux has superior memory and thread management compared with Windows, and Windows results should be far less impressive.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by CodeMutant
    1 paq8o8_64 -6; same as "1"; took 159.25 sec; same as above for source; compression=107792

    2 paq8o8_64 -6; same as "2"; took 87.12 sec(time of last thread finished); compression = 114970

    3 paq8o8_64 -6; same as above; took 59 seconds for the last thread; compression = 116500

    4 paq8o8_64 -6; 2 on one core and the rest 2 on the other core; took 45.76 seconds (time of last finished thread); same source; compression = 122661

    5 paq8o8_64 -6; just like "3" above; took 36.16 sec (time of last finished thread); same source; compression =123875

    6 paq8o8_64 -6; just like "4"; took 30.10 sec(time of last finished thread); same source; compression =128504

    8 paq8o8_64 -5; just like "6"; took 22.33 seconds(time of last finished thread); same source;compression=138630
    Are these CPU times or wall times? How do you get any speedup with more than 2 threads with 2 cores?

  9. #9
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    these were time "printed" on console by paq808_64 while exiting each task. it printed this and total memory taken for the job. I have absolutely no reason to believe that they were not directly related to "performance speed"... as i also independently "clocked" it to similar time(i think).. the point is.... I am not dreaming ... and nor are you..... surely it can be replicated...see it for yourself then

    cheers matt,
    What a wonderful tool man... thanks a lot. Sure, many people have benefited already

  10. #10
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    oh no.... as that those doubts came from matt, i checked it out again myself.... in stop watch 1min:30sec was common to all the modes of opreation 2thread,3thread,4thread; running paq8o8_64 with -6 as option.

    The time that paq8o8_64 is dumping at the end, i think, is cpu time...and as size of each file decreases that surely will decrease...but since we have more thread...the overall time would be same ;(

    someone disprove this please.... i think... i am getting sick ;(...
    sorry for the giving the wrong news guys.... i will make amendments next time

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    That seems right, 2 threads on 2 processors is twice as fast. Werner Bergmans (maximumcompression.com) did a similar test running 2 copies of paq8 and found 18% slowdown. Your result looks like 14% slowdown, pretty close. I think the extra overhead is mostly due to sharing memory so there are more cache misses.

    Anyway, paq8o8 measures wall time, but there could be a delay in starting and ending the thread.

  12. #12
    Member
    Join Date
    Jan 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ah.... i know there are a lot of people who would like to see paq8 series put to some practical uses.... just curious: do they use it in supercomputers or clusters?? i mean, is paq based "anything" used for "any" real purpose?

    I suppose no other algo parallels paq in compression... wish i could compress the knowledge of whole word into my disk.. .. i'll put it for this years xmas list

    The more widespread a "powerful" compression algo in the world, the more chances i see of us surviving a scare ;(

    if anything survives a major Civilization tragedy, it wont be "big" enough to do that ; it will most propably be "small" enough

  13. #13
    Member
    Join Date
    Feb 2008
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I'm wondering, in theory is there a way to change the compressing/uncompressing to work with several threads accessing one piece of memory?
    Running just one thread of paq can eat up lots of memory.
    I notice almost all compression programs out there only run on single threads, most new computers nowadays are dual cores, this would be a good advantage to have.

  14. #14
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by niknah
    I notice almost all compression programs out there only run on single threads
    7-Zip?
    Quote Originally Posted by niknah
    Im wondering, in theory is there a way to change the compressing/uncompressing to work with several threads accessing one piece of memory?
    Running just one thread of paq can eat up lots of memory.
    It was discussed somewhere on this form, but no one havedone it

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    It might be possible to run the different models in different threads, but it means synchronizing after every bit is compressed and running the mixer, SSE stages and arithmetic coder serially in one thread before working on the next bit in parallel. It should not hurt compression or increase memory use. I haven't tried it because I don't have a multicore machine to test it.

  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > It might be possible to run the different models in different threads, but it
    > means synchronizing after every bit is compressed

    It doesn't actually, at least encoder could be made fully threaded,
    though there'd be a bit of a mess if you try to use the same counter
    table for multiple asynchronous threads (but anyway imho its still
    possible).

    Alas, its not that simple with decoding, while its mainly decoding
    speed which we'd want to increase first...
    But still, things like speculative execution might allow to avoid
    synchronization after _each_ bit.

    > I haven't tried it because I don't have a multicore machine to test it.

    Well, you don't actually need a multicore machine to write a threaded
    program, and there're lots of people who'd like to test your
    experimental implementations

Similar Threads

  1. LLVM 2.6 released, quick try with paq8o8
    By Hahobas in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 29th November 2009, 22:31
  2. Multi-threading motivation
    By Trixter in forum Data Compression
    Replies: 1
    Last Post: 10th September 2008, 05:18
  3. Strange gcc4.3 results with paq8o8
    By Hahobas in forum Forum Archive
    Replies: 8
    Last Post: 22nd March 2008, 19:44
  4. Paq8o8 - endless loop when BMPs are invalid
    By schnaader in forum Forum Archive
    Replies: 1
    Last Post: 20th December 2007, 13:57
  5. Replies: 1
    Last Post: 18th April 2007, 19:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •