Hello there, everyone.
While I was browsing the internet, I became very sad of all the big File Uploading websites where you have to pay to actually download content fast, or have a free account with Waiting Times, slow download rates, an AWFUL amount of advertisements and only a few downloads a day, etc.
So I was thinking about opening my own File Hosting Service. Now of course I understand that needing such a big amount of space to host the files on can easily become very expensive, and that those other File Hosting services need to do all these advertisements etc. to actually still get some money. However, I had another idea how this might be fixed.
What if you compress all files on a webserver?
If I am right, most compression algorithms tend to compress data better if you have a bigger amount of data ( more data -> higher chance on duplicate data). So if you compress all files on the webserver together, this could really save tons of space. For instance, if a user uploads a file that already exists on the server, the duplicate file should take almost no extra space.
However, right now I'm thinking what algorithm would work best in this kind of situation. I'm new to Compression techniques, and altough I've read Wikipedia(and some .PDF's linked to from Wikipedia) and I have a general understanding of Math and Programming, I'm still a Rookie.
I need an algorithm that:
-Is able to compress data better when there is a LOT of other data. I.e., algorithms with a Sliding Window are probably not the best.
-Files should be able to be compressed and decompressed independently.
-The compression/decompression time shouldn't be too long.
This tends to make me think about Context-based compression techniques.
I was thinking about DMC, but I don't know if it really is a good idea. Also I've read the Wikipedia article of PAQ, but I still don't understand how it works internally.
Could somebody please help me?
Have a great day!