Results 1 to 6 of 6

Thread: Compressor Specialized For Numerical Sequences / Math

  1. #1
    Member
    Join Date
    Dec 2017
    Location
    Las Vegas, NV
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Compressor Specialized For Numerical Sequences / Math

    Could it be beneficial to create a specialized compressor for number sequences / mathematical data? For example, if a file contained the string "2468", a compressor could encode that, for example, as "2BC" (where B is +2, and C is 3 iterations). Could this be effective in some cases? Are there any compressors that do this already?


    *Btw, I'm a newb at compression. I hope you'll go easy on me if I sound naive.

  2. #2
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts
    I think in this particular example, just perform differ of the sequence ( which becomes 2222 ) and then run RLE: 2, 4. But I am not sure in general how to find formula to generate a sequence. Many seemingly random integer sequences are in fact generated by fixed formulas ( random number generation remains to be a research topic ), but how to reverse the generation process perhaps is far more harder.

    Disclosure: I am not an expert in compression either ...

  3. #3
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    There are some compressors which have special models for linear sequences like "2468", winrar, zcm, nanozip, jampack, and I think freearc. This kind of filter is really easy to write, its just the previous value minus the current one. Something that doesn't seem obvious at first is that delta coding is a bijection, meaning there's no expansion on the data. For example a sequence "190, 97" would cause a difference of -93, but if we're working on an alphabet of 0 to 255 it's the same as -93%256=163. Everything will stay within 0 to 255 without ever needing to transmit the negative.

    As for the actual compression of the sequence that's totally up to you. Run length encoding is simple and fast enough for this case.

  4. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by jamiestroud69 View Post
    string "2468"
    After the days of Eniac we try to model the numbers in computers as binary numbers. The reason for this is that storing binary numbers is about 30x faster than ascii and we can use random access to fixed length entities.

    Let's say we have 64 bit binary numbers that change a bit from one number to the next. There, one can do delta coding -- but delta coding has the problem of increasing entropy when there is a presence of noise. Another improvement is to have joint entropy coding of insertion length and copy length. If we have 0xffeeddcc00112233 and the next number is 0xffeeddcc00998877, we can codify that we have a (copy, insert) length pair of (5,3) with one symbol. Also, we can decide to have LZ77 where we always have a backward distance of 8 -- then we don't need any bits for encoding it.

    This is not a theoretical possibility -- simple and relatively fast compression formats like brotli allow for this, but the stock encoder would probably not find such solutions. Particularly, it would be likely to use many LZ77 distances within the same block.

  5. #5
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    For fast general integer compression, check out Daniel Lemire's FastPFOR and other libraries: https://github.com/lemire/FastPFor

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    we do it in multimedia compression - sound, pictures and video

Similar Threads

  1. why Compression of DNA sequences is a very challenging task?
    By omran farhat in forum Data Compression
    Replies: 4
    Last Post: 7th September 2016, 23:36
  2. Books on Math
    By Bulat Ziganshin in forum Download Area
    Replies: 0
    Last Post: 30th December 2015, 07:14
  3. Help with bitmask math in dec
    By SvenBent in forum The Off-Topic Lounge
    Replies: 5
    Last Post: 27th April 2015, 01:09
  4. Math help or special calculator needed
    By SvenBent in forum The Off-Topic Lounge
    Replies: 19
    Last Post: 13th September 2014, 21:43
  5. Finding most frequency sequences in data?
    By RichSelian in forum Data Compression
    Replies: 5
    Last Post: 21st September 2012, 03:29

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •