Results 1 to 6 of 6

Thread: XORpressor. Yet another idea for a compressor

  1. #1
    Member
    Join Date
    Aug 2016
    Location
    Lisbon
    Posts
    13
    Thanks
    2
    Thanked 3 Times in 3 Posts

    XORpressor. Yet another idea for a compressor

    While I am working on a larger project, a bit of it mixed with the old idea of the xor compressor gave me an idea to implement.
    I'm sharing here in the hope that someone wants to implement it. I don't think it would compress better than any existing one, but would open the field for future specific improvements. That's why for now it's not my priority.

    Here, more or less the first Mb of enwik8 just with the lowercase words: https://mega.nz/#!lt1HgIbA!Onm7b8NRHICnOUkBW5PZnV4kxF7H0j927N69FfB 0hjo

    Getting a dictionary out of it should be easy as a explode with the " " char, index words as array keys and get that array of keys. The dictionary includes all and every word.
    I think the dict should be sorted and searched binary for the better timing, I had not worked with hashing, if you think it would be better, please explain so.
    A future improvement would be threat different the unique occurrences. If this is the only thing stopping you from implementing it, give me a shout and I would do.

    The logic is to trim the file to half its size by xoring each 2 bytes. So the compressed file would count half + dictionary size.
    This can be done recursively up to the point that the dictionary entries are not worth, there has to be created a dictionary for each recursive xor process. There is room for improvement here also.

    Decompressor logic (almost pseudocode):
    Up to a loop of the length of the max length word, it will build all the options from reversing the xor process.

    For binary example:
    if xor is 1, the input was 01 or 10, so if one of them is in the dict we output the word and move on.
    If not we add to the previous, so next iteration we have like: 0101, 1001, 0110, 1010. We check those on dict, if match output, if not move on.

    So at one point we maybe had output one that was at the dictionary but was not correct, as it should be other option. Then the stream probably would get stuck.
    We here move back 1 step each time until the last choice and choose the next choice instead the first one and check then if we can get to the end.

    Unlikely but possible, that other wrong solution get us to the end of the output, we would have to resort to flags in such critical points of the decoding like, at position x take y choice.

    For this lowercase trim of one 1mb of enwik, words fill like 800kb. Compressed with xor in 1/2 + Dict size is like 100kb (uncompressed). Total of ~500
    This output xor file should be able to be compressed also.

    ---

    Now I'm guessing for a "recursive" or looped compression of 2 times, it should trim original size from 800 to 1/4 = 200. Dictionaries would weight 100 + ~150 (100-200). Total of ~450
    For three times... each iteration is guessing more so if someone could calculate beforehand would be welcome:
    xored 100 + 250 (2 prev dicts) + 200 dict (100-300). So it's not worth unless the dictionary could be merged or otherwise improved.

    ---

    Any brave coders gonna prove me right or wrong? Or willing to work with me?
    Last edited by Cristo; 20th August 2016 at 13:35. Reason: typo 0110

  2. The Following User Says Thank You to Cristo For This Useful Post:

    lorents17 (24th August 2016)

  3. #2
    Member
    Join Date
    Aug 2016
    Location
    Lisbon
    Posts
    13
    Thanks
    2
    Thanked 3 Times in 3 Posts
    Here, kind of init test: https://github.com/Cristo-Conklin/ReXOR

  4. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    me, brave READER of your propsal, think it is menaingless. just think why LZ, PPM and other methiods work with bytes rather than bits, and whether they will be able to continue work with bytes after your transformation. it is not worst problem of your idea, but easiest to describe

  5. #4
    Member
    Join Date
    Aug 2016
    Location
    Lisbon
    Posts
    13
    Thanks
    2
    Thanked 3 Times in 3 Posts
    I'm working with bytes as you can see in the github example. It's just the example that is binary.
    Maybe I explained bad, please address any issues you think this idea would have.


  6. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    tyou are right, i forgot this part of your spec. still, it doesn't have meaning compared to just assigning unique number to each word in dictionary. with XOR, you will have lots of collisions without any advantages compared to unique-numbering scheme

  7. #6
    Member
    Join Date
    Aug 2016
    Location
    Lisbon
    Posts
    13
    Thanks
    2
    Thanked 3 Times in 3 Posts
    The advantage is to trade some cpu power instead of the storage of the map of the replacements.
    I will try to xor recursively until the dict grows to a point that it is not worth to continue to do so. Maybe the 2-3rd round, so it would make 12.5-25% of original size + dict sizes.
    Another advantage is that doing only the xor operation and maybe the dict, the compression is lighting fast.

    I will try this to see the results unless there is a better proof of failure, like someone could prove that mathematically. If not, time has to be wasted on the experiment.
    I do also have some trick up my sleeve to mix when this is working with another method I'm working on.

  8. The Following User Says Thank You to Cristo For This Useful Post:

    Razor12911 (28th August 2016)

Similar Threads

  1. Compr idea
    By Rudi in forum Data Compression
    Replies: 0
    Last Post: 8th May 2016, 01:06
  2. Anyone thought of this yet? (Image Compressor Idea)
    By Lucas in forum Data Compression
    Replies: 12
    Last Post: 5th May 2015, 13:48
  3. LZ77 Variation Idea
    By Kennon Conrad in forum Data Compression
    Replies: 21
    Last Post: 27th September 2014, 19:52
  4. StartMT (an idea)
    By SvenBent in forum Data Compression
    Replies: 4
    Last Post: 13th May 2014, 18:07
  5. Idea: Smoothed probability
    By Piotr Tarsa in forum Data Compression
    Replies: 2
    Last Post: 19th February 2013, 21:30

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •