Results 1 to 21 of 21

Thread: List - Image/ Video corpus for compression

  1. #1
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts

    Lightbulb List - Image/ Video corpus for compression

    While working on compression , one of the issues is to find good modern test images or test videos.As a result i think it would be a good idea to compile a list.

    Info on Internet use vs Camera Images


    Color images(Internet use):

    Jyrki Alakuijala image corpus


    WyohKnott image corpus

    Camera images


    Rawzor test images

    Raw Pixels

    Medical images

    Silesia compression corpus

    Video

    Xiph.org Video Test Media [derf's collection]

    EBU Test Sequence ( look for "Public Test Sequences" on the site. You have to register to get FTP access.)

    Elephant dream lossless video and audio


    Kindly suggest some more in the comments.
    Last edited by khavish; 10th September 2017 at 07:51. Reason: added camera image

  2. The Following User Says Thank You to khavish For This Useful Post:

    schnaader (25th November 2017)

  3. #2
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    21
    Thanks
    33
    Thanked 18 Times in 11 Posts
    I agree i think we need a common corpus too
    Not only for images
    Luca

  4. The Following User Says Thank You to LucaBiondi For This Useful Post:

    khavish (9th September 2017)

  5. #3
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by khavish View Post
    Kindly suggest some more in the comments.
    I like to think that there are two main kinds of photographic corpora from the image compression viewpoint. First kind are the corpora that are subsampled and possibly enhanced, i.e., modeling professional internet use. Second kind are photo corpora that are camera resolution, modeling immediate storage and archival use. The two corpora you posted have subsampling and are more useful for modeling the internet use. This tends to make them relatively sharp, i.e., more high frequency content (more difficult to compress), but low on noise and interpolation/reconstruction artefacts (less difficult to compress).

    The following two are two relatively good camera resolution corpora:

    http://imagecompression.info/test_images/

    ICIP corpus https://jpeg.org/items/20151126_icip_challenge.html

    Previously on encode.ru it has been shown that some of the images in ICIP have been pre-compressed in JPEG. Those images should be left out from further codec analysis, and especially so at high quality/bit rates or lossless compression.

    The images with camera resolution tend to have artefacts of the reconstruction algorithm in the final reconstructed image. These artefacts have correlations. In the lossless world, my experience was, that it is possible to get another 10 % lossless density improvement by targeting such correlations, but the gain of such approaches disappears if a camera resolution image is subsampled. I'm not personally convinced that it is a good idea to try to model reconstruction artefacts of a particular algorithm in a data compression algorithm. I like to think that reconstruction algorithms are adjusted from one camera generation to the next, while an image compression system should ideally have a longer life span.

  6. The Following 2 Users Say Thank You to Jyrki Alakuijala For This Useful Post:

    boxerab (9th September 2017),khavish (9th September 2017)

  7. #4
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    For video, the European Broadcast Union (EBU) provides a public test set for free:

    Go to https://tech.ebu.ch/testsequences

    and look for "Public Test Sequences" on the site. You have to register to get FTP access.

  8. The Following User Says Thank You to boxerab For This Useful Post:

    khavish (9th September 2017)

  9. #5
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts
    I added some more public corpus for videos

  10. #6
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Great. (There is a typo: EDU should be EBU )

  11. #7
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts
    Quote Originally Posted by boxerab View Post
    Great. (There is a typo: EDU should be EBU )
    Typo fixed.....it would be great to have more sources for medical and camera images ...

  12. #8
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    For medical images, there are quite a few here: http://www.dclunie.com/
    in the "images" section, but many of these may be in DICOM format, so you would need
    a DICOM toolkit to extract the images.

    http://www.dcm4che.org/ (java)
    http://gdcm.sourceforge.net/wiki/index.php/Main_Page (C++)

  13. #9
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Thanks for the Elephant Dream link, I've been looking for something like this for quite a while.

  14. #10
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    /* duplicate */

  15. #11
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by boxerab View Post
    For video, the European Broadcast Union (EBU) provides a public test set for free:

    Go to https://tech.ebu.ch/testsequences

    and look for "Public Test Sequences" on the site. You have to register to get FTP access.
    The EBU test sequences are quite ok, though you have to be a bit careful as some of them have been generated from 422 or 420 subsampled material. This is the right thing to do if you're compressing video and transmit subsampled - quite typical for broadcasting (and the job of EBU, of course), but not ideal for imaging.

    I don't think ICIP images have been pre-compressed, at least not that I'm aware of. At least, I have the "raw" originals, though I did not process them to their 444 RGB representation - or at least not all of them.

    There is currently another test corpus we're playing with for JPEG XS which also includes some "freaky" artificial images, and some natural ones. Images can be shared IIRC, so we should probably put them on jpeg.org at some point.

    There are also a couple of well-known MPEG test sequences like "crowd run" or "park joy", though they are managed by WG11, not WG1 (and particularly, not myself). "crowd run" is quite often used, though I personally would probably argue against it - it is pretty noisy, film grain all over, but 10bpp. Digitalized analog film. This was ok as a workflow ten years ago, but today it's all digital recording to begin with, so the quality you get is typically better than that.

    Oh well, that's a WG11 issue probably...

  16. The Following 2 Users Say Thank You to thorfdbg For This Useful Post:

    boxerab (10th September 2017),Jyrki Alakuijala (10th September 2017)

  17. #12
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by thorfdbg View Post
    I don't think ICIP images have been pre-compressed, at least not that I'm aware of.
    https://encode.ru/threads/2392-JPEG-...made-available suggest that 2 of images are very likely precompressed as JPEG, and 2 others might be.

    It was also discussed that the acquisition technology (digital cameras) of ICIP 2016 are 10-15 year old models.

    Why does the age of the acquisition technology matter? Because at camera resolution some of the correlations come from the image reconstruction algorithm -- the image is not a one-to-one mapping from the ccd recordings, but has to be computed. This computation generates synthetic correlations in the final reconstructed image. Newer generations of devices may have improvements in these algorithms and produce different kinds of correlations. I think this effect is easiest to observe with lossless compression. Some algorithms (I think BCIF mostly, possibly also some with complex context modeling) can model the particularities of some reconstruction artefacts, and such algorithm get an additional advantage at camera resolution. After resizing they can lose their advantage.

    Even considering these two beauty mistakes in ICIP 2016 image set, it still creates an interesting and important benchmark for image compression.

  18. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    khavish (10th September 2017)

  19. #13
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Repository with a large collection of raw photo samples from various camera manufacturers:
    https://raw.pixls.us/

    Usually between 8 to 16bps, various CFA filter layouts.


  20. The Following User Says Thank You to mpais For This Useful Post:

    khavish (10th September 2017)

  21. #14
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    The kodak image set is here: http://r0k.us/graphics/kodak/. It's not so popular these days.

  22. #15
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    Elephant Dream torrent doesn't work (post is 11 years old). So, should probably remove this.

  23. #16
    Member
    Join Date
    May 2014
    Location
    Canada
    Posts
    136
    Thanks
    61
    Thanked 21 Times in 12 Posts
    For DNG images, the excellent Graphics Magick program can read and convert to other formats.

    http://www.graphicsmagick.org/

  24. #17
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Why does the age of the acquisition technology matter?
    It depends on the use case you have for your algorithm in mind. If this is for digital photography compression, then preserving the film grain does not matter, and an attempt to do so might be a burden to an algorithm that is otherwise suitable for the problem at hand. IOWs, if you are looking for an algorithm for image archives, such images may matter. If you don't, then it does not matter. The question is always which conclusions you draw from the result given the images you have. Just plotting PSNR curves and then taking the average may tell the wrong story if you do not select the images according to the use case.

  25. #18
    Member
    Join Date
    Aug 2017
    Location
    Mauritius
    Posts
    59
    Thanks
    67
    Thanked 22 Times in 16 Posts
    Quote Originally Posted by boxerab View Post
    Elephant Dream torrent doesn't work (post is 11 years old). So, should probably remove this.
    You can find the direct download links for elephants dream and other lossless video source on https://media.xiph.org/

  26. The Following User Says Thank You to khavish For This Useful Post:

    boxerab (26th November 2017)

  27. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

  28. #20

  29. #21
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Previously on encode.ru it has been shown that some of the images in ICIP have been pre-compressed in JPEG.
    Here: https://encode.ru/threads/2392

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  30. The Following User Says Thank You to Alexander Rhatushnyak For This Useful Post:

    Jyrki Alakuijala (27th November 2017)

Similar Threads

  1. Encode's Compression Corpus (EncCC)
    By encode in forum Download Area
    Replies: 5
    Last Post: 21st December 2017, 13:43
  2. Reading list on compression
    By boxerab in forum Data Compression
    Replies: 6
    Last Post: 12th May 2017, 23:15
  3. Multi-language text compression corpus?
    By Paul W. in forum Data Compression
    Replies: 13
    Last Post: 19th November 2015, 19:06
  4. looking for deep color image formats/corpus, not too weird
    By Paul W. in forum Data Compression
    Replies: 5
    Last Post: 14th April 2014, 21:57
  5. Silesia compression corpus
    By encode in forum Data Compression
    Replies: 29
    Last Post: 8th June 2012, 10:53

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •