Results 1 to 15 of 15

Thread: packJPG

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts

    packJPG

    Hi, I am new to this forum and looking at packJPG to try to re-implement it in parallel fashion so that encoding/decoding can be faster on multi-core machines. But I couldn't find more detailed explanation of algorithms used in packJPG, especially in the pack_pjg() function, other than the author's 2007 paper. While source code is relatively easy to understand data flows, it is really hard to get exact details on the pack_pjg() algorithm. Can anyone here give a hand, by either some useful explanations or pointers to some more detailed writings/discussions?

    Thanks in advance for any help.

  2. #2
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    http://encode.ru/threads/1809-practi...hlight=packmp3


    i tried once but this is as far as i could go

    good luck with this

  3. #3
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    Thx for the pointer. But I am looking for per file level, not parallel processing of a batch of files. That is why I need to understand the details of packpjg algorithms. I think lots of places can be parallelized.

  4. #4
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    @Skymmer: You said once, back in 2013 this:

    As for "lack of interest" mentioned here...
    Most of a people have a poor knowledge about compression itself, so try to tell 'em about recompression
    IMHO recompression is interesting for developers who creating it, and for compression enthusiasts of course.
    For example I asked one man to compile the PackJPG 2.5f DLL with Intel Compiler. He did this and also used the profiling so resulting speedup was about 13.4% comparing to native EXE.
    I gave this DLL to another man who created multi-threaded version of PackJPG.
    http://skymmer.narod.ru/images/shots...-01-52_000.png
    On the test-set which consists of 4477 files (315 308 107 bytes) original PackJPG takes 226 sec. while optimized and multi-threaded version takes 27 seconds. So actually there is an interest. But as I said it exists in a quite underground manner.
    Could you please post that version to the public? Or better yet, do the same to packMP3?

    Thank you very much in advance. It could really make a difference in the way people store backups. At least, this is great to me.

  5. #5
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Gonzalo

    I know its not the same but have you tried using ppx2 ? ppx2 run a command multiple times from a list you can created with a dircmd

    dir /b *.jpg | ppx2 -P %NUMBER_OF_PROCESSORS% -L 1 Packjpg -np "{}"


    Should work like a charm

  6. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    @SvenBent: Thank you. I'll try. Anyway, this shouldn't be more than a temporarily workaround to true MT, IMHO.

  7. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Well it depends on how you see it. it is a pretty low tech solutions but it might be optimal from a performance view point.

    People here know much more about this than me so I might be wrong. but i believe that some code is just not easy to multithread. especially when you are working the same data. if 2 threads on two cores works on the same data it has to go back and forth and suddenly you have a lot of reads/writes through the different caches. slowing down the process. and you CPU get slow down by the slow cache ( compared to the cpu )

    The ppx2 way is much easier. and give and almost 100% scalability in cpu performance. off course it come at the coast of memory usage and doesn't work if you are only doing one file.

  8. #8
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    A good starting point is the "JPG developers package" that is now also available at GitHub. Regarding multi-threading, better use uncmpJPG as a reference. It is packJPG (although a slightly older version, it seems) without arithmetic coding. Without this part, uncmpJPG seems to be around 5 times faster, so the first step could be to split packJPG into JPG processing and arithmetic coding. Just "pipelining" this could speed packJPG up to the speed of the arithmetic coding alone (20% faster).

    Identifying arithmetic coding as the main part of the packJPG process helps a lot, because you now know that even if you reduce the processing part to zero time, arithmetic coding would still take the same time, so improving and parallelizing the arithmetic coder would be the main part.

    Also, as good as the results from the arithmetic coder are, it still has all the disadvantages - being slow, symmetric (compression time = decompression time) and not easily parallelizable, so looking for alternatives, trading size for speed would be an option, too.

    The main part of the uncompressed files are the "CMP" tables - reordered DCT coefficients that have some interesting properties that can help in compressing them - in this case, they are all in a very narrow range around zero and this range even decreases with increasing file position. So e.g. in the first 10%, you might have values from -512 to 512 while the last 90% is full of zeroes. Most compressors don't perform that good on signed 16-bit values, so adding 32768, splitting in 2 planes and much more preprocessing could be helpful.

    For more observations and results for an example file, have a look at the thread I just opened.
    Last edited by schnaader; 18th February 2016 at 17:36.
    http://schnaader.info
    Damn kids. They're all alike.

  9. The Following User Says Thank You to schnaader For This Useful Post:

    Bulat Ziganshin (18th February 2016)

  10. #9
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    That really helps. I am tracking your thread.

  11. #10
    Member
    Join Date
    May 2016
    Location
    USA
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How far did you get with this? I was able to get much faster performance using icc but couldn't see how to multithread the code easily.

    The potential is there to do pipelining, where one phase, e.g. compressing, reads from the previous phase, e.g. decoding. That alone would probably save 15%, but I think there are other options too.

    icc:

    --> packJPG v2.5k (01/22/2016) by Matthias Stirner / Se <--
    Copyright 2006-2016 HTW Aalen University & Matthias Stirner
    All rights reserved

    Processed 1 of 1 files [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]

    -> 1 file(s) processed, 0 error(s), 0 warning(s)
    ---------------------------------
    total time : 0.83 sec
    avrg. kbyte per s : 2440 byte
    avrg. comp. ratio : 76.21 %
    ---------------------------------


    gcc:

    --> packJPG v2.5k (01/22/2016) by Matthias Stirner / Se <--
    Copyright 2006-2016 HTW Aalen University & Matthias Stirner
    All rights reserved

    Processed 1 of 1 files [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]

    -> 1 file(s) processed, 0 error(s), 0 warning(s)
    ---------------------------------
    total time : 1.08 sec
    avrg. kbyte per s : 1875 byte
    avrg. comp. ratio : 76.21 %
    ---------------------------------

  12. #11
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Could you upload icc binary? I'd like to compare its performance with Visual Studio version. I saw a similar speed-up with precomp. See this for example.

    Right now I'm trying to do the build but it gives me an error on bitops.cpp:

    Code:
    'setmode':identifier not found

  13. #12
    Member
    Join Date
    Apr 2009
    Location
    here
    Posts
    202
    Thanks
    165
    Thanked 109 Times in 65 Posts
    i made a x64 GCC compile, at least it is about 10% faster than the official x86.
    Attached Files Attached Files
    Last edited by load; 21st June 2016 at 20:07.

  14. The Following User Says Thank You to load For This Useful Post:

    a902cd23 (21st June 2016)

  15. #13
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by load View Post
    i made a x64 GCC compile, at least it is about 10% faster than the official x86.
    Your file is appreciated, but I thought you contribute an x64, the one you attached is x86

    Object is photo of my neighbours trees, jpg 5312x2988 = 6 594 286 bytes - on ramdisk
    i7-3770k @ 4400 mhz
    --> packJPG v2.5f 3.06 sec 77.84 %
    --> packJPG v2.5g 2.90 sec 77.84 %
    --> packJPG v2.5h 2.90 sec 77.84 %
    --> packJPG v2.5i 2.90 sec 77.84 %
    --> packJPG v2.5j 2.89 sec 77.84 %
    --> packJPG v2.5k 2.90 sec 77.84 %
    i7-3770k @ 2000 mhz
    --> packJPG v2.5f 8.39 sec 77.84 %
    --> packJPG v2.5g 7.94 sec 77.84 %
    --> packJPG v2.5h 7.96 sec 77.84 %
    --> packJPG v2.5i 7.97 sec 77.84 %
    --> packJPG v2.5j 7.94 sec 77.84 %
    --> packJPG v2.5k 7.96 sec 77.84 %

    all packed files have same crc32/md5

  16. #14
    Member
    Join Date
    Apr 2009
    Location
    here
    Posts
    202
    Thanks
    165
    Thanked 109 Times in 65 Posts
    sorry, i'm back, i had to leave. i replaced the attachment, it contains both x86 and x64, also the dev_builds (though i have no idea what they do ).

    i guess the x86 won't be faster than the official build. and yes, as far as i know, the output for 2.5f ... 2.5k has not changed.

  17. The Following User Says Thank You to load For This Useful Post:

    a902cd23 (21st June 2016)

  18. #15
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by load View Post
    sorry, i'm back, i had to leave. i replaced the attachment, it contains both x86 and x64, also the dev_builds (though i have no idea what they do ).

    i guess the x86 won't be faster than the official build. and yes, as far as i know, the output for 2.5f ... 2.5k has not changed.
    New x64
    --> packJPG v2.5k 6.85 sec @ 2000 mhz
    --> packJPG v2.5k 6.85 sec @ 2000 mhz (dev)
    --> packJPG v2.5k 2.59 sec @ 4400 mhz
    --> packJPG v2.5k 2.59 sec @ 4400 mhz (dev)

Similar Threads

  1. packJPG v2.5 released under GPL v3
    By packDEV in forum Data Compression
    Replies: 16
    Last Post: 9th February 2012, 05:47
  2. packJPG v2.5C3
    By packDEV in forum Data Compression
    Replies: 3
    Last Post: 22nd October 2011, 15:23
  3. PreComp + PackJPG
    By squxe in forum Data Compression
    Replies: 2
    Last Post: 16th May 2008, 19:53
  4. PackJPG v2.2 released!
    By LovePimple in forum Forum Archive
    Replies: 29
    Last Post: 3rd February 2008, 20:42
  5. Fastet Packjpg on the way ?
    By SvenBent in forum Forum Archive
    Replies: 3
    Last Post: 24th November 2007, 23:01

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •