# Thread: Compressor Specialized For Numerical Sequences / Math

1. ## Compressor Specialized For Numerical Sequences / Math

Could it be beneficial to create a specialized compressor for number sequences / mathematical data? For example, if a file contained the string "2468", a compressor could encode that, for example, as "2BC" (where B is +2, and C is 3 iterations). Could this be effective in some cases? Are there any compressors that do this already?

*Btw, I'm a newb at compression. I hope you'll go easy on me if I sound naive.  2. I think in this particular example, just perform differ of the sequence ( which becomes 2222 ) and then run RLE: 2, 4. But I am not sure in general how to find formula to generate a sequence. Many seemingly random integer sequences are in fact generated by fixed formulas ( random number generation remains to be a research topic ), but how to reverse the generation process perhaps is far more harder.

Disclosure: I am not an expert in compression either ... 3. There are some compressors which have special models for linear sequences like "2468", winrar, zcm, nanozip, jampack, and I think freearc. This kind of filter is really easy to write, its just the previous value minus the current one. Something that doesn't seem obvious at first is that delta coding is a bijection, meaning there's no expansion on the data. For example a sequence "190, 97" would cause a difference of -93, but if we're working on an alphabet of 0 to 255 it's the same as -93%256=163. Everything will stay within 0 to 255 without ever needing to transmit the negative.

As for the actual compression of the sequence that's totally up to you. Run length encoding is simple and fast enough for this case. 4. Originally Posted by jamiestroud69 string "2468"
After the days of Eniac we try to model the numbers in computers as binary numbers. The reason for this is that storing binary numbers is about 30x faster than ascii and we can use random access to fixed length entities.

Let's say we have 64 bit binary numbers that change a bit from one number to the next. There, one can do delta coding -- but delta coding has the problem of increasing entropy when there is a presence of noise. Another improvement is to have joint entropy coding of insertion length and copy length. If we have 0xffeeddcc00112233 and the next number is 0xffeeddcc00998877, we can codify that we have a (copy, insert) length pair of (5,3) with one symbol. Also, we can decide to have LZ77 where we always have a backward distance of 8 -- then we don't need any bits for encoding it.

This is not a theoretical possibility -- simple and relatively fast compression formats like brotli allow for this, but the stock encoder would probably not find such solutions. Particularly, it would be likely to use many LZ77 distances within the same block. 5. For fast general integer compression, check out Daniel Lemire's FastPFOR and other libraries: https://github.com/lemire/FastPFor 6. we do it in multimedia compression - sound, pictures and video #### Tags for this Thread

math, number, sequence, string #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•