Results 1 to 4 of 4

Thread: Numbers vs text compression

  1. #1
    Member
    Join Date
    Mar 2016
    Location
    Spain
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Numbers vs text compression

    Are there some differencies if I compress numbers than text data? I have numbers all in ascii range so they all can be represent as characters. I can't find any advantages, I know some techniques as delta coding, elias gamma coding and so on but I still get better result with regular data independent compressors. Thanks

  2. #2
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    depends on what you mean with numbers if its just byte values that swings in between the same value as the byte value of normal human text symbols then no. its not because its human text symbols that it compressed well. but because of the structurer in how we write text
    however you will be abel to compress pretty god because of your value "Resolutions" is low aka you dotn need all 8 bits to describe a value which especially gets exploitet by the entropy coder like hufman/arithmic coding

    anyone saying anything else on this forum. they are pretty much right and you should ignore me

  3. #3
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    How best to compress numbers depends on how they're laid out and what kind of regularities there are. If you just have a skewed but otherwise random distribution of byte values, any reasonably strong "general-purpose" compression algorithm like gzip or LZMA or PPMd will do reasonably well: it will not find many repeating strings of bytes and will end up entropy coding most of the byte values according to the skewed distribution, using Huffman or arithmetic codes. If you have an array of numbers where nearby vaues are often similar but not exactly the same, you'll want to detect and exploit the structure of the array.

    When in doubt, try LZMA. It can find some structural and numerical regularities that normal text-oriented compression algorithms can't. But it misses some, too. Preprocessors like Bulat Ziganshin's "delta" and "mm" can find more structure in data---2D tables, interleaved streams of audio or video samples, etc.

    The main thing to remember is that everything in computers is represented as "numbers", but that doesn't tell you how to compress them. Some things look like text, with repeating sequences of the same exact byte values, other things look like arrays of records with numerically similar field values, some things look like arrays of arrays, others like arrays of variable-length records with optional fields, etc. Fancy (and slow) context-mixing algorithms like PAQ and cmix can find and exploit a lot of these kinds of regularities, but not all.

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I did some tests comparing text and binary formats for a list of numbers (primes up to 1 million) at http://encode.ru/threads/2414-Prime-Number-Benchmark

    Most compressors compress a binary list of 32 bit integers better than the same list of numbers in decimal as a text file. But there are some notable exceptions. For example, zip compresses the text to 190K. It compresses the binary file to 198K in big-endian format but 105K in little-endian format. ppmd and zpaq ICM-ISSE chains compress better in big-endian format.

Similar Threads

  1. Advice for compression of flat text files?
    By jlkwan in forum Data Compression
    Replies: 4
    Last Post: 23rd October 2015, 19:37
  2. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 16:31
  3. Rationale for Text Compression
    By cfeck in forum Data Compression
    Replies: 34
    Last Post: 20th November 2013, 04:43
  4. Random numbers
    By nburns in forum The Off-Topic Lounge
    Replies: 9
    Last Post: 9th September 2013, 05:53
  5. Compressing prime numbers
    By Matt Mahoney in forum Data Compression
    Replies: 14
    Last Post: 18th May 2013, 18:41

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •