Page 1 of 2 12 LastLast
Results 1 to 30 of 38

Thread: SSIM / MSSIM vs MSE

  1. #1
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts

    SSIM / MSSIM vs MSE

    Can anyone here comment on SSIM vs MSE ?

    Does SSIM indeed perform better than MSE ?

    Thanks so much,
    Aaron

  2. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by boxerab View Post
    Does SSIM indeed perform better than MSE ?
    Yes.

  3. #3
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Yes.
    Thanks. I ask because I am thinking of ways of improving JPEG 2000 rate control, which uses MSE.
    J2K has a way of estimating MSE decrease with each coding pass. Is this possible with SSIM?

  4. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by boxerab View Post
    Thanks. I ask because I am thinking of ways of improving JPEG 2000 rate control, which uses MSE.
    J2K has a way of estimating MSE decrease with each coding pass. Is this possible with SSIM?
    Should be possible with SSIM, with one concern: JPEG 2000 is a color-image compression algorithm, but SSIM is a gray-scale metric, and there is no right way to use it for color. Some people use it separately for each channel.

  5. #5
    Member
    Join Date
    Oct 2015
    Location
    Belgium
    Posts
    29
    Thanks
    9
    Thanked 10 Times in 8 Posts
    You could use something like this:
    https://github.com/pornel/dssim
    to take care of the color problem. It works in L*a*b* color space, which may not be perfect but should at least be perceptually better than RGB, YCgCb or YCoCg.

  6. #6
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by boxerab View Post
    Can anyone here comment on SSIM vs MSE ?

    Does SSIM indeed perform better than MSE ?
    Not really. It's mostly different, but not really better. SSIM alone (without multiscale) is little more than PSNR with a simplistic masking term, i.e. it is not hard to see that
    1-SSIM(x,y) \approx = MSE / ( 1 + x^2 + y^2) (IIRC).

    This denominator is called "visual masking" in the spatial domain, with a masking exponent of two. Well, while this is seems probably better than nothing, the masking exponent does not quite fit the result of subjective experiments - the masking exponent in the DC domain is somewhere between 1 and 2, and the masking exponent for higher frequencies is even below 1.

    Even worse, SSIM does not model the CSF - contrast sensitivity function - which is a very well-known effect in the frequency domain.

    Multiscale-SSIM does have "some" way to model the CSF, but masking is still off.

    One way or another, if you talk to the folks that do such subjective experiments, SSIM or MSSIM is really not much better than MSE or PSNR, don't trust it.

    If you're looking for a *good* subjective quality index, go for VDP or hdrVDP - this is worth something - but it requires proper calibration and a couple of parameters that describe the display luminance, the viewing distance and the color space of the image (naturally!).

    That SSIM, MSE and a lot of other attempts do not require such parameters should at least make you suspicious.

  7. #7
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Should be possible with SSIM, with one concern: JPEG 2000 is a color-image compression algorithm, but SSIM is a gray-scale metric, and there is no right way to use it for color. Some people use it separately for each channel.
    Oh, SSIM is so simple, it's not hard to "fix it" for color. It's bad enough, so a very simple modification makes it applicable to color: Convert the RGB image to some sort of opponent-color space (similar to what your brain does), then measure SSIM on the luminance and the opponent color coordinates, with a large weight on luminance and small weights on the two chroma coordinates. A very simple approach that works "well enough" is simply to convert to YCbCr, and use a weight of 0.8 on luma and 0.1 on Cb and Cr.

    Then, it is also not too hard to create a JPEG 2000 encoder that is optimal in SSIM (not PSNR) - I did this a while ago for a DCC paper - and check the resulting images. Indeed, the SSIM score goes through the roof, but the images do not really look better. They only look "different" (neither better nor worse, just other artifacts).

    The problem here is really that SSIM lacks a good physiological model. A simple CSF modification of JPEG 2000 plus a simple masking model yields better results (subjectively) than a SSIM-optimized encoder.

  8. #8
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by boxerab View Post
    Thanks. I ask because I am thinking of ways of improving JPEG 2000 rate control, which uses MSE.
    J2K has a way of estimating MSE decrease with each coding pass. Is this possible with SSIM?
    Yes. Been there, done that. You find the results here:

    Th. Richter, K.J. Kim: "A MS-SSIM Optimal JPEG 2000 Encoder", Proc. of Data Compression Conference, Snowbird, 2009. (You'll find it on the IEEE Explorer)

    If you're interested, I can send you the paper. I should even have the software - back them, Kim and I implemented this on the JJ2000 not on Accusoft's proprietary coder.

  9. The Following 2 Users Say Thank You to thorfdbg For This Useful Post:

    boxerab (21st December 2015),Jon Sneyers (21st December 2015)

  10. #9
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Thanks for your insights, Thomas. Yes, I would very much like to see a copy of your paper.

    Aaron

  11. #10
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by boxerab View Post
    Thanks for your insights, Thomas. Yes, I would very much like to see a copy of your paper.
    Here you go.

    Merry Christmas, BTW!
    Attached Files Attached Files

  12. #11
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Quote Originally Posted by thorfdbg View Post
    Here you go.

    Merry Christmas, BTW!
    Thanks! Merry Christmas!

  13. #12
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts

    butteraugli

    Quote Originally Posted by boxerab View Post
    Can anyone here comment on SSIM vs MSE ?

    Does SSIM indeed perform better than MSE ?

    Thanks so much,
    Aaron
    We have just opensourced butteraugli, a new non-parametric method for estimating the noticeability of lossy compression artefacts.

    https://github.com/google/butteraugli

  14. The Following 2 Users Say Thank You to Jyrki Alakuijala For This Useful Post:

    boxerab (27th March 2016),Jon Sneyers (13th February 2016)

  15. #13
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    I ran butteraugli's compare_pngs on the Lena image (original and compressed/decompressed with webp).

    I got:
    original/original: 0
    original/webp default: 3.77
    original/webp q90: 2.31
    original/webp q95: 1.96
    original/empty png: 44.68

    It all looks sensible but I think the cutoff kButteraugliBad = 2.095 is too strong because the default webp compression yields a good image (hard to spot the errors at first sight) yet returns 3.77.
    It is also a bit hard to make sense of the result. Is there a way to linearize the returned values ? Do you have metrics against a variety of images ?

  16. #14
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by hexagone View Post
    I ran butteraugli's compare_pngs on the Lena image (original and compressed/decompressed with webp).

    I got:
    original/original: 0
    original/webp default: 3.77
    original/webp q90: 2.31
    original/webp q95: 1.96
    original/empty png: 44.68

    It all looks sensible but I think the cutoff kButteraugliBad = 2.095 is too strong because the default webp compression yields a good image (hard to spot the errors at first sight) yet returns 3.77.
    It is also a bit hard to make sense of the result. Is there a way to linearize the returned values ? Do you have metrics against a variety of images ?
    A value below 1.6 is great, a value below 2.1 okeyish. Above 2.1 there is likely a noticeable artefact in an inplace flip test. You can try the flip test with your images.

    Typically we saw good results in JPEG with quality 92 to 94 in YUV444 mode. I am not an expert on the performance of WebP lossy part, but I believe it is not performing very well for the highest possible quality, like jpegs above quality 92 in YUV444.

  17. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    Jon Sneyers (3rd March 2016)

  18. #15
    Member
    Join Date
    Oct 2015
    Location
    Belgium
    Posts
    29
    Thanks
    9
    Thanked 10 Times in 8 Posts
    Butteraugli is rather slow. Do you have an idea how much room for speedup there is?

  19. #16
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jon Sneyers View Post
    Butteraugli is rather slow. Do you have an idea how much room for speedup there is?
    I think it can be made 20x faster at least, possibly 100x. That would simplify integrating it into an image compression algorithm.

  20. #17
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    What's the story behind the name butteraugli ?

  21. #18
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by boxerab View Post
    What's the story behind the name butteraugli ?
    I wanted more vowels than there are in PSNRHVS-M and MS-SSIM-YUV, an association with the human eye... and with a small bread (gipfeli, zopfli, brotli). I deliberately chose an overly complex term to avoid creating homonym noise for something as specific as this.

    Voisilmäpulla, Finnish butter eye buns, translated to German is something like Butteraugebrötchen, and I took the liberty to invent a new pseudo-Swiss-German word from it, and butteraugli was born.

    http://www.food.com/recipe/finnish-b...m-pulla-326192 -- Tasty with filter coffee.

  22. The Following 2 Users Say Thank You to Jyrki Alakuijala For This Useful Post:

    boxerab (31st March 2016),schnaader (30th March 2016)

  23. #19
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Thanks for the explanation Now I am getting hungry ........

  24. #20
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    BTW, there is a division by zero in Average5x5() when (x,y) points to the lower right corner of an image (n == 0).

  25. #21
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Jyrki, I don't see any information about the development of butteraugli.

    Did you test it against some ground truth of human observation tests?
    Last edited by cbloom; 3rd August 2016 at 19:12.

  26. #22
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by cbloom View Post
    Jyrki, I don't see any information about the development of butteraugli.

    Did you test it against some ground truth of human observation tests?
    We have a small internal test of ~2000 image pairs, and about 10 specially-designed calibration images for those psychovisual effects we have chosen to model in butteraugli.

    We did compare with TID2008 and TID2013, but that database seeks to answer a different question "How awful is the degradation?", while ours concentrates on "Can you notice the degradation?"

    Our analysis is not water-tight, but only indicative, partly because we use all our data to optimize the model. I will write a better report about it within the next two months, as well as push a new improved version.

  27. #23
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    We did compare with TID2008 and TID2013, but that database seeks to answer a different question "How awful is the degradation?", while ours concentrates on "Can you notice the degradation?"
    It is relatively "easy" to come up with a good "sub-threshold" quality index (i.e. the question your tool wants to answer). The effects are pretty much known: Visual making, CSF, Cortex filter. If so, I would really really recommend to look into VDP-2 because it has a pretty elaborate model. Unfortunately, due to the filters, it can only model self-masking, and it has no idea about color and the chroma CSF. Super-threshold (i.e. "how awful is the compression") is a much harder question, and it is also observer-dependent (do you prefer block defects or a blurry picture? Entirely subjective). There are a couple of algorithms that claim to work in this domain (SSIM is one of them), though in reality, in the tests I made back then, VDP-2 still works best in this domain. I wouldn't focus on a single dataset. TID2013 might be ok, but I would also suggest to look into LIVE2, and I would certainly also suggest to make subjective tests. If you're interested, I'm here in contact with a group of people/labs in Europe named "Qualinet" that do that on a professional basis (i.e. run subjective tests), and there are also a couple of good papers on crowdsourcing subjective evaluation (with all potential dangers of this approach). Should be in the proceedings of the Qomex. (Which is probably the conference I would recommend to you the most in this case).

  28. The Following User Says Thank You to thorfdbg For This Useful Post:

    Jyrki Alakuijala (31st March 2016)

  29. #24
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by thorfdbg View Post
    ... VDP-2 ...
    Do you happen to know of an opensourced C or C++ implementation of VDP-2? It would be nice to run it as a reference.

  30. #25
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Do you happen to know of an opensourced C or C++ implementation of VDP-2? It would be nice to run it as a reference.
    Unfortunately, no. There is a C++ implementation of an older version that is open source and ready to download, and there is a closed-source C++ implementation that was done under the administration of Dolby (and for which I got the promise that it would be open sourced at some point), and there is the open source matlab implementation you surely know. I currently use the matlab implementation with a bash-wrapper around so I can use it in my automated tests, but indeed, this situation is not ideal.

  31. #26
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    I wanted more vowels than there are in PSNRHVS-M and MS-SSIM-YUV, an association with the human eye... and with a small bread (gipfeli, zopfli, brotli). I deliberately chose an overly complex term to avoid creating homonym noise for something as specific as this.

    Voisilmäpulla, Finnish butter eye buns, translated to German is something like Butteraugebrötchen, and I took the liberty to invent a new pseudo-Swiss-German word from it, and butteraugli was born.

    http://www.food.com/recipe/finnish-b...m-pulla-326192 -- Tasty with filter coffee.
    Interesting. I thought it had something to do with the phrase "butt ugly".

  32. #27
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by m^2 View Post
    Interesting. I thought it had something to do with the phrase "butt ugly".
    There is a new improved butteraugli version available today.

  33. #28
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    I'd love to see the power of Google used to generate a new large-scale human rating database of image distortions.

    Something like a little web page that shows the original & two distortions, and the human picks which one looks best. Or maybe rates them. I'm not sure.

    I spent some time on this problem before, and I *think* my solution was pretty good, but I decided that without better validation data it's all a bit questionable.

    At the moment there are way too many metrics and not a clear way to tell that they are working.

    The other really big problem that's not solved well at the moment is a more perceptual metric that can be used in-loop for R/D optimization. All the perceptual metrics are much too slow for this. The only attempt I even know of in this domain is x264's SATD hack, which is a big improvement over just using SAD or SSD, but surely there's something better.
    Last edited by cbloom; 3rd August 2016 at 19:12.

  34. #29
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by cbloom View Post
    I spent some time on this problem before, and I *think* my solution was pretty good, but I decided that without better validation data it's all a bit questionable.
    Would you consider open sourcing your solution?

  35. #30
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by cbloom View Post
    The other really big problem that's not solved well at the moment is a more perceptual metric that can be used in-loop for R/D optimization. All the perceptual metrics are much too slow for this. The only attempt I even know of in this domain is x264's SATD hack, which is a big improvement over just using SAD or SSD, but surely there's something better.
    Actually, it's possible to include MS-SSIM in JPEG 2000 - that's not too had and doable. It's just that the results are not too much different (MS-SSIM scores are great, but the images do not look much better, they look different). Of course, you've always to make compromises. But elements like visual masking or visual weighting (CSF) are not hard to add to a JPEG 2000 encoder.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •