Results 1 to 14 of 14

Thread: "Bits per quality" metric

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    UK
    Posts
    17
    Thanks
    5
    Thanked 1 Time in 1 Post

    "Bits per quality" metric

    I'm tweaking a lossy encoder. Some tweaks increase file size and improve output quality at the same time (or reduce file size and reduce quality). How can I measure whether those changes are beneficial overall? (i.e. check if it's a Pareto improvement)

    Is there a commonly accepted metric for this? I'm looking for a metric that would give me a single number that I could use to automatically compare different versions of the encoder with each other.

    Edit: I need results in less than 10 seconds, and metric must be able to notice improvements by less than 1%. I deliberately don't want any sort of manual testing. I'm OK with imperfections of objective quality metrics (they're good enough for testing small incremental improvements).
    Last edited by porneL; 1st October 2013 at 22:30.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i think that you need method to compute quality itself. balance between compression and quality is user's selection

  3. #3
    Member
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    24
    Thanks
    4
    Thanked 4 Times in 4 Posts
    That doesn't exist yet. You need blind tests with real persons. And even then it's not easy to compare the results.

    If you need quick results, you can do blind tests by yourself and people close to you and available.

    But there might be a system how the signal-to-noise-ratio affects quality depending on its strength, the frequency, the color and its location relative to other signals. You might get inspiration from the audio world. hydrogenaudio.org is the encode.ru for lossy audio compression In these forums you are not allowed to post your opinion about quality, unless you have done an ABX-test.

    The development for the mp3 encoder LAME and the h246-video-encoder x264 might be interesting.

  4. #4
    Member
    Join Date
    Jul 2013
    Location
    UK
    Posts
    17
    Thanks
    5
    Thanked 1 Time in 1 Post
    To avoid optimizing encoder for narrow cases I always have to test every change on hundreds of images, and for statistically significant results on incremental quality improvements in 1% range I have to repeat the test of each image hundreds of times with hundreds of people.

    Subjective tests are out of the question. I can't spend several months and thousands of dollars on testing each tweak of the encoder when I have 10 things to try every hour. That's just batshit insane compared to running a script, even if the script is imperfect.

    Automated objective tests are fantastic for improving encoders. When I first started working on pngquant I was testing images manually, but since it's a lot of work, I could only test no more than 3-5 images at a time. I ended up making encoder worse, because of poor statistical significance of "looks good to me!". I started making biggest progress with pngquant when I created automated test suite of 1200 images. However, in pngquant I only track quality. With filesize/quality ratio it's harder.


    The problem is that filesize is linear and most quality metrics are non-linear, and things get even hairier when filesize or difference approaches 0.

    I've tried "file size * MSE" and "file size / PSNR" but that doesn't seem to make sense (subjectively I think JPEG's sweet spot is near libjpeg's quality 70 setting, but various formulas I try indicate sweet spot in ether 0 or 100 setting, both of which obviously suck).

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Judging image quality is an unsolved AI problem. You have to model the human visual perception system, which we don't know how to do at a high level. For example, it is more important to get faces right than the background, so your test software has to be able to recognize faces.

    If we knew how to solve it, then it would be possible to compress images to less than 1 KB by converting it to a text description and compressing it (assuming a picture is worth 1000 words). The decompresser would read the text and draw the picture. The test software would compress it again and see if it generates the same text.

    In fact, we know from experiments with human long term memory, like those done by Standing and Landauer, that we only store about 30 bits after looking at a picture for 5 seconds. That's enough to pass a recall test where you look at 10,000 pictures, then are shown new photos and are asked if you have seen them before.

    So until this problem is solved, you will have to judge image quality by looking at it.

  6. #6
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Judging image quality is an unsolved AI problem. You have to model the human visual perception system, which we don't know how to do at a high level. For example, it is more important to get faces right than the background, so your test software has to be able to recognize faces.

    If we knew how to solve it, then it would be possible to compress images to less than 1 KB by converting it to a text description and compressing it (assuming a picture is worth 1000 words). The decompresser would read the text and draw the picture. The test software would compress it again and see if it generates the same text.

    In fact, we know from experiments with human long term memory, like those done by Standing and Landauer, that we only store about 30 bits after looking at a picture for 5 seconds. That's enough to pass a recall test where you look at 10,000 pictures, then are shown new photos and are asked if you have seen them before.

    So until this problem is solved, you will have to judge image quality by looking at it.
    30 bits has to underestimate the amount of significant information. A viewer would likely notice noise if it was present, but not remember specifically the data that had been obscured if the noise was absent. To get exactly 30 bits, you'd probably have to upload them directly to the brain.

    To assess the effects of large numbers of small changes subjectively, maybe you could test combinations at the same time and use statistical techniques to separate their individual effects. Perhaps an even better technique would give testers a slider with many images and let them find the best setting. Doctors reading MRIs have found that a slider that quickly flips through adjacent cross sections of the body is the best way to find subtle tumors and things.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Here is the paper by Landauer on human long term memory. He estimates 10^9 bits. http://csjarchive.cogsci.rpi.edu/198...p0493/MAIN.PDF

    The paper references the work by Standing on picture recall. He estimates that 10 to 14 bits per picture are required to achieve the observed test scores. Test subjects were shown up to 10,000 pictures and tested 2 days later. You could argue that more bits would be required for short term memory. But it really is much less than it seems. This video (starting at about 9 minutes, and especially at the end) gives some good examples of just how little information we perceive in images. http://www.ted.com/talks/dan_dennett...ciousness.html

  8. #8
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Actually, more to the point is the fact that recalling and seeing are different things. If you have time to study an image, you can easily pick out more detail than you could recall later. A image compression scheme has to solve the more difficult problem.

    The problem of recognizing an image based on a small amount of recalled information would be like computing a hash, which is much easier.

  9. #9
    Member
    Join Date
    Jul 2013
    Location
    UK
    Posts
    17
    Thanks
    5
    Thanked 1 Time in 1 Post
    Please don't derail the topic with philosophical waxing about judging image quality. It doesn't help at all, as I just can't use it in automated test suite.

    I need instant (objective) metric that is "good enough", I doesn't need to be perfect. In fact the quality part is not the problem here (I consider it solved - I'll just use SSIM, it's perfectly adequate for the task). I just need to figure out math for objective quality/filesize ratio. It's merely a math problem.

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I agree that SSIM is a better measure than MSE or PSNR. I am just pointing out why it's a hard problem.

    And no, I don't know of a formula for quality/filesize. I suggest plotting a graph. It is a similar problem when comparing compressors by ratio and speed and trying to boil it down to one number. You can't (although there are benchmarks that do anyway) because sometimes size is more important and sometimes speed is more important. Likewise for size and quality. Look at the Pareto frontier. What is the best quality for a given compression ratio and vice versa?

    But don't be surprised if the image that looks best is not the one with the best SSIM score. Visual perception is a lot more complicated than that.

  11. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by porneL View Post
    Please don't derail the topic with philosophical waxing about judging image quality. It doesn't help at all, as I just can't use it in automated test suite.

    I need instant (objective) metric that is "good enough", I doesn't need to be perfect. In fact the quality part is not the problem here (I consider it solved - I'll just use SSIM, it's perfectly adequate for the task). I just need to figure out math for objective quality/filesize ratio. It's merely a math problem.
    So you already have numbers for quality and numbers for size, and you are just looking for a way to relate them statistically, like a linear regression? Although I'm certainly not an expert, I know there are good ways, and I'd look at wikipedia for ideas:

    http://en.wikipedia.org/wiki/Regression_analysis

    http://en.wikipedia.org/wiki/Correlation_and_dependence


    It's also an optimization problem. There is probably something useful here:

    http://en.wikipedia.org/wiki/Mathematical_optimization

    I hope that's what you're looking for.
    Last edited by nburns; 3rd October 2013 at 12:37.

  12. #12
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I agree that SSIM is a better measure than MSE or PSNR. I am just pointing out why it's a hard problem.
    It's actually not that much different, and depending on the test conditions, PSNR might be pretty close. But anyhow, as already stated, the best the OP can do is to perform subjective tests and report the MOS (mean opinion scores) under well-defined, calibrated test conditions.

    Anyhow, despite this, I would suggest to have a serious look at VDP (Daly's Visual Difference Predictor) which is, at least in my experience, a lot better than SSIM or PSNR for *near threshold error detection*. Near threshold means: Hardly noticable differences. Error detection means that it will provide a probability map which defines, for its standard observer, the probability of detecting an error at a specific image location. It seems to correlate quite well to my own vision, at least.

    If you want to, you can pool up the probability to an overall error-detection probability that has *some meaning* for near-threshold vision and that is based on knowledge of the HVS. VDP has a couple of drawbacks, too: It does not take color into account (first step is transformation into XYZ space, then dropping X and Z), and it only includes self-masking, no neighbour-masking effects. But still, a pretty good model. Oh, and pretty slow...

  13. #13
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by nburns View Post
    So you already have numbers for quality and numbers for size, and you are just looking for a way to relate them statistically, like a linear regression?
    Neither *exactly* my field, but what you typically do is to report the Pearson correlation coefficient (linear regression between model and MOS), and Spearman rank correlation (linear regression of the relative order of the results). SC is, unlike Pearson, invariant under monotonic functions, i.e. it does not matter which unit you use for measuring. However, before all that, one usually applies outlier removal, i.e. detection of observers that did not took the job seriously and just reported some nonsense results.

  14. #14
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by porneL View Post
    I need instant (objective) metric that is "good enough", I doesn't need to be perfect. In fact the quality part is not the problem here (I consider it solved - I'll just use SSIM, it's perfectly adequate for the task). I just need to figure out math for objective quality/filesize ratio. It's merely a math problem.
    Partially. Just to give you an idea what we are doing here: Make plots of distortion (in PSNR or log(1-SSIM)) over bitrate (bits per pixel), then publish those. A lot of things can already be seen there. For example, any sane compression should have a slope of 6dB per bit for larger bitrates. If that's not met, you did something wrong. If you just want to report a single number for comparing two methods, the Bjontegaard delta-PSNR is an accepted (ad-hoc) method.

Similar Threads

  1. Replies: 7
    Last Post: 4th January 2016, 15:06
  2. File "Type" identification tool
    By soor in forum Data Compression
    Replies: 4
    Last Post: 6th June 2011, 04:04
  3. PAQ8 C++ precedence bug (or "-Wparentheses is annoying")
    By Rugxulo in forum Data Compression
    Replies: 13
    Last Post: 21st August 2009, 20:36
  4. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 21:53
  5. Freeware "Send To" interface for CCM and QUAD
    By LovePimple in forum Forum Archive
    Replies: 2
    Last Post: 20th March 2007, 17:22

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •