Results 1 to 9 of 9

Thread: Quick review on pik fast mode

  1. #1
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts

    Quick review on pik fast mode

    The latest version of pik seems to offer both better and faster compression.Note: This is nothing scientific but rather just a quick test.

    Old pik
    Code:
    time cpik bench.png bench_1.0_old.pik
    Compressing with maximum Butteraugli distance 1.000000
    Compressed to 115793 bytes
    
    
    real    1m0.421s
    user    0m58.672s
    sys     0m1.738s
    
    time cpik bench.png bench_1.0_old.pik --fast
    Compressing with fast mode
    Compressed to 125221 bytes
    
    
    real    0m0.158s
    user    0m0.135s
    sys     0m0.022s
    

    Latest pik (with SIMD + updated butteraugli)

    Code:
    time cpik bench.png bench_1.0_new.pik --fast
    Compressing with fast mode
    Compressed to 126405 bytes
    
    
    real    0m0.178s
    user    0m0.139s
    sys     0m0.023s
    
    time cpik bench.png bench_1.0_new.pik
    Compressing with maximum Butteraugli distance 1.000000
    Compressed to 107785 bytes
    
    
    real    0m37.641s
    user    0m36.794s
    sys     0m0.840s


    Libjpeg compresion speed

    Code:
    time jpeg -q 90 -oz -h -qt 3 -qv bench.ppm bench_jpeg_90.jpg
    jpeg Copyright (C) 2012-2014 Thomas Richter, University of Stuttgart
    and Accusoft
    
    This program comes with ABSOLUTELY NO WARRANTY; for details see 
    README.license.gpl
    This is free software, and you are welcome to redistribute it
    under certain conditions, see again README.license.gpl for details.
    
    
    0 bytes memory not yet released.
    
    4038577 bytes maximal required.
    
    542 allocations performed.
    
    real    0m0.234s
    user    0m0.227s
    sys     0m0.007s
    It seems that pik fast mode is faster than LibJPEG. Ofcourse we lose some compression here(~8%) with respect to pik default mode(--distance 1.0) .In my view the fast mode makes pik usable in a production setting , where the fast mode can be used first to serve the user the image asap and then the original image can be placed in a queue to be optimized by pik 1.0 later , thus gaining the 8% extra compression without compromising on user experience.
    Last edited by khavish; 9th January 2018 at 19:57. Reason: added more precision

  2. The Following User Says Thank You to khavish For This Useful Post:

    Jyrki Alakuijala (9th January 2018)

  3. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    That's great news -- and consistent with my own experiments. For this image, going from 116 kB to 108 kB is roughly representative from the sequence of improvements we have been able to deploy in the latest release of PIK. I'm anticipating we can go another 5-10 % to about 100 kB in the high quality mode before freezing the format.

    In the low quality end (if you'd compress this image down to 25 kB), we have more ideas and can likely give a ~30 % improvement still.

  4. #3
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts
    Latest pik shows significant performance improvement(35x faster compared to old version and even 22x better than fast mode )
    Code:
    time cpik bench.png bench.pik 
    Compressing with maximum Butteraugli distance 1.000000
    Compressed 500 x 606 pixels to 121425 bytes (0.70 MB/s, 4 threads).
    enc find best2                          :          1 x      3573012643 =      3573012643
    enc YTo* correlation                    :          1 x      1134709111 =      1134709111
    EncodeToBitstream                       :          1 x       258142525 =       258142525
    QuantizeCoeffs                          :          6 x         8680026 =        52080161
    || Opsin->SRGB8                         :        400 x          129247 =        51699040
    enc ctan+quant                          :          6 x         7068471 =        42410831
    recon                                   :          5 x         5799022 =        28995112
    OpsinDynamicsImage                      :          1 x        18928062 =        18928062
    enc OpsinToPik uninstrumented           :          1 x        18311578 =        18311578
    || colorTransform                       :        400 x           17507 =         7002925
    || TFGraph RunSource slow               :        265 x           16666 =         4416575
    AlignImage                              :          1 x         3642356 =         3642356
    || Dequant                              :        400 x            8988 =         3595284
    aq DiffPrecompute                       :          1 x         3513318 =         3513318
    TileDistMap                             :          5 x          625902 =         3129512
    aq AdaptiveQuantMap                     :          1 x         2309746 =         2309746
    Convolve fast                           :        153 x           11191 =         1712373
    CenterOpsinValues                       :          1 x          782089 =          782089
    || DequantDC                            :         10 x           72090 =          720904
    || TF task strip                        :        110 x            2845 =          312950
    MakeTileFlowDC                          :          5 x           25139 =          125695
    || colorTransformDC                     :         10 x            7226 =           72261
    MakeTileFlow                            :          5 x            8986 =           44933
    || TFGraph RunSink                      :        810 x              30 =           24977
    TFGraph thread join                     :         15 x            1140 =           17106
    || TFGraph RunNode                      :        410 x               3 =            1597
    || TFGraph RunSource zerocopy           :       1745 x               0 =             353
    Total clocks during analysis: 21430591
    Total clocks measured: 5209714017
    
    
    real    0m1.682s
    user    0m2.118s
    sys    0m0.043s
    The fast mode seems a bit slower

    Code:
    time cpik bench.png bench.pik --fast
    Compressing with fast mode
    Compressed 500 x 606 pixels to 134866 bytes (28.49 MB/s, 4 threads).
    EncodeToBitstream                       :          1 x        53765834 =        53765834
    enc OpsinToPik uninstrumented           :          1 x        23762603 =        23762603
    OpsinDynamicsImage                      :          1 x        18992473 =        18992473
    QuantizeCoeffs                          :          1 x         9576681 =         9576681
    enc ctan+quant                          :          1 x         7856621 =         7856621
    aq DiffPrecompute                       :          1 x         4457889 =         4457889
    AlignImage                              :          1 x         3649433 =         3649433
    aq AdaptiveQuantMap                     :          1 x         3158557 =         3158557
    CenterOpsinValues                       :          1 x          761851 =          761851
    enc fast quant                          :          1 x          633202 =          633202
    Convolve fast                           :         18 x           23957 =          431235
    Total clocks during analysis: 20771968
    Total clocks measured: 127046379
    
    
    real    0m0.411s
    user    0m0.885s
    sys    0m0.007s
    Note :

    1.This is not a scientific benchmark , its just me doing a quick test
    2. Some JPEG encoders might be faster (i.e libjpeg_turbo)
    3. I tried using different number of threads but i got slower speeds every time
    4. I did the test 3 times and i average . OS : Fedora 28
    Last edited by khavish; 13th June 2018 at 20:35.

  5. #4
    Member
    Join Date
    Apr 2009
    Location
    here
    Posts
    202
    Thanks
    165
    Thanked 109 Times in 65 Posts
    here's latest PIK for windows.

    --fast is a lot faster here, almost 4 times. though it still uses only 1 core here, --num_threads does nothing

    but i'm not able to do any proper tests, due to lack of knowledge

    it's the whole .exe package...

    but there are issues when compiling for windows, lots of compiler warnings, also the output is garbage:

    Compressing with fast mode
    Compressed zu x zu pixels to zu bytes (0.00 MB/s, zu threads).
    enc OpsinToPik uninstrumented : zu x zu = zu
    EncodeToBitstream : zu x zu = zu
    QuantizeCoeffs : zu x zu = zu
    enc ctan+quant : zu x zu = zu
    OpsinDynamicsImage : zu x zu = zu
    aq DiffPrecompute : zu x zu = zu
    aq AdaptiveQuantMap : zu x zu = zu
    AlignImage : zu x zu = zu
    CenterOpsinValues : zu x zu = zu
    enc fast quant : zu x zu = zu
    Convolve fast : zu x zu = zu



    edit:
    compression/decomperssion itself sems to work fine.
    Attached Files Attached Files
    Last edited by load; 12th June 2018 at 19:56.

  6. #5
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by khavish View Post
    Latest pik shows significant performance improvement(35x faster compared to old version and even 22x better than fast mode )
    There is a typo on the link. It says https://ee873a5f3c3dfb9d4e1f369401ca1087731bc808/ when it should be https://github.com/google/pik/commit...ca1087731bc808

  7. The Following User Says Thank You to Gonzalo For This Useful Post:

    khavish (13th June 2018)

  8. #6
    Member
    Join Date
    Aug 2016
    Location
    Zürich
    Posts
    12
    Thanks
    3
    Thanked 5 Times in 4 Posts
    Quote Originally Posted by load View Post
    --fast is a lot faster here, almost 4 times. though it still uses only 1 core here, --num_threads does nothing
    Thanks for testing on Windows!
    FYI the num_threads argument does not do anything to the current encoder.
    That will be easy to add once we've frozen the format.

    In the decoder, we do see some benefit from num_threads=4..8 on large images, but the decoder hasn't been fully parallelized yet.

    but there are issues when compiling for windows, lots of compiler warnings, also the output is garbage:
    Yes, we don't compile often with MSVC. The "zu" output is apparently because the compiler doesn't support C99. Are you using MSVC2015 or later?

  9. #7
    Member
    Join Date
    Apr 2009
    Location
    here
    Posts
    202
    Thanks
    165
    Thanked 109 Times in 65 Posts
    FYI the num_threads argument does not do anything to the current encoder.
    That will be easy to add once we've frozen the format.
    good to know, thanks,

    Yes, we don't compile often with MSVC. The "zu" output is apparently because the compiler doesn't support C99. Are you using MSVC2015 or later?
    i did use mingw and gcc 5.4, here's a better compile with proper output. also i updated to gcc 7.2

    Code:
    Compressing with fast mode
    Compressed 5616 x 3744 pixels to 4378294 bytes (25.88 MB/s, 4 threads).
    enc OpsinToPik uninstrumented           :          1 x      1741718625 =      1741718625
    EncodeToBitstream                       :          1 x      1730014484 =      1730014484
    QuantizeCoeffs                          :          1 x       762662972 =       762662972
    enc ctan+quant                          :          1 x       626527024 =       626527024
    OpsinDynamicsImage                      :          1 x       436502686 =       436502686
    aq DiffPrecompute                       :          1 x       332189519 =       332189519
    aq AdaptiveQuantMap                     :          1 x       290071386 =       290071386
    AlignImage                              :          1 x       216264555 =       216264555
    CenterOpsinValues                       :          1 x        78132163 =        78132163
    enc fast quant                          :          1 x        50542811 =        50542811
    Convolve fast                           :         18 x         1540033 =        27720608
    most compiler warnings are like

    simd_helpers.h:33:29: warning: requested alignment 32 is larger than 16 [-Wattributes]
    if it helps, i can post the full compiler output.
    Attached Files Attached Files

  10. The Following User Says Thank You to load For This Useful Post:

    khavish (13th June 2018)

  11. #8
    Member
    Join Date
    Aug 2016
    Location
    Zürich
    Posts
    12
    Thanks
    3
    Thanked 5 Times in 4 Posts
    Glad to see the output works after compiler update.
    FYI the 1x.. rows will soon require a separate "--print_profile 1" flag.

    most compiler warnings are like.. if it helps, i can post the full compiler output.
    Thanks, but we rarely compile with MSVC, so probably warnings will creep in again. I'd suggest waiting until the code is closer to finalized.

  12. The Following User Says Thank You to Jan Wassenberg For This Useful Post:

    khavish (13th June 2018)

  13. #9
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts
    Quote Originally Posted by Gonzalo View Post
    Typo corrected....Thanks

Similar Threads

  1. PIK image format
    By Jyrki Alakuijala in forum Data Compression
    Replies: 72
    Last Post: 29th May 2019, 23:39
  2. COMPRESSING AES CBC MODE OUTPUT
    By biject.bwts in forum Data Compression
    Replies: 3
    Last Post: 24th January 2012, 23:40
  3. LLVM 2.6 released, quick try with paq8o8
    By Hahobas in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 29th November 2009, 22:31
  4. Tamp Quick LZ compression
    By Sportman in forum Data Compression
    Replies: 2
    Last Post: 28th September 2008, 00:20
  5. Debug mode!
    By encode in forum Forum Archive
    Replies: 0
    Last Post: 12th May 2006, 16:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •