I tried a few different approaches to the same problem a while ago. I haven't tried Christian's tough.
But for me, the best result was using something like this:
Plain and simple. That if the images are really alike. But given the amount of files you have, their sizes, and how similar they are, I would favour any solid lempel-ziv method with a large dictionary over an image-specific one. Maybe 7z, rar or even the new razor.
shar as - *.bmp | fazip rep+mm+grzip
shar as - *.bmp | fazip rep+lzma
Also, zpaq is really great at storing similar files because it is designed to de-duplicate everything in the first place, and it can use the very first file you added to find similarities with any other you might add later. I'm a little out of my element here, so I don't know if it were to be practical to use with the cloud.
Maybe if you could provide a sample of any random day for us to try something else...
What the fish?
We have a series of images produced during the working day, around 500 000 to 800 000 of them ... The images are all of salmon filets, so look almost identical.
Please excuse my ignorance, but would you care to elaborate on what kind of job do you have exactly? Perhaps some quality control department? Government health agency? Thank you! Just curious...