Results 1 to 3 of 3

Thread: text compression?

  1. #1
    Member
    Join Date
    Mar 2015
    Location
    here
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question text compression?

    Hello i am new

    i am programing in win 32 c+.

    i have create a file converter that create any file to 10 char "abcdefghij"

    so the conversion result look like this example a.png size 2ko

    my conversion result 11ko
    infile look like this:
    babaacbagaacaahccadabcfahabaeababceafabcjccahcjcaa dcjccai
    caaeabccafabaaceadcjccagcaaeabccafabceadabciaiabcj ccacaja
    babcbaeabaacaahccadabcjciafcjcbaicjcbabcaajabcdaba bcbajab
    cbaiabcfajabcjcjafcjciahahababcaaiabcgaiabafababcj cgaicjc
    bbacjccabcjccajccadabaaceafcjcbahcjcjajcaahabcjcfa cacabab
    cbaiabcfajabcjchafcjciahahababcaaiabaaceafcabaabcj chajcjc
    caccbadabcjcjagcjciaiabababajababcjcibaabababajaba bcjciba
    abababajababcjcibaabababajababcjcibaabababajababcj cibaaba
    etc....

    what is the best way to compress 10 char for smaller file?

    thank for help

  2. #2
    Member
    Join Date
    May 2006
    Location
    Uruguay
    Posts
    30
    Thanks
    0
    Thanked 1 Time in 1 Post
    The best compression would be to convert it back to 8 bits symbols and compress with a compressor optimized for the original file. In your example it is unlikely you will compress much better than the original png as the png is already compressed.
    Doing an arbitrary conversion to text won't increase the compression, the entropy will be the same (unless the converter adds additional data), but the redundancy will be harder to find.
    Text doesn't compress well because it is text, text compress well because the natural and artificial languages have high redundancy.

    If you must keep the file in 10char format and you want to compress it with a general purpose compressor, you will likely obtain more compression if the transformation converts each 8 bits symbol into a fixed number of chars (eg. 200 -> caa). That way compressors that are oriented to bytes will still find most of the redundancy. Otherwise you won´t be able to compress more than the 0-order entropy of the text file.
    Last edited by ggf31416; 15th March 2015 at 20:19.

  3. #3
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    When doing data compression then the goal is to remove redundancy. ggf31416 already gave the significant hint (to convert it back). Actually the best would be to even decompress the png file first so you have the raw data.

    png is a container format that uses the compression method "deflate" for the raw data and also stores some auxilliary data (like the height and the width).

    Deflate it not the best method the planet has to offer and it is especially not so good when it's about compressing images. The best method to compress an image depends on the redundancy in the image. For example it's different if your image shows a foto realistic scenary or some simple shapes (like lines, polygons, characters, etc.). If you have simple shapes then you archive better results by using an image file format like svg (scalable vector graphics). Even though a svg file probably is much smaller than the png file it is still much bigger than what you could archive by not using xml (svg uses xml).

    By the way: "Text compression" is usally about compressing text and not everything that can be opened with a text editor without showing strange symbols. Text is what you write in english or russian. Even source code already has a different kind of redundancy.
    Last edited by just a worm; 16th March 2015 at 16:37.

Similar Threads

  1. Replies: 25
    Last Post: 4th January 2017, 17:02
  2. lost interest in text data compression
    By RichSelian in forum The Off-Topic Lounge
    Replies: 12
    Last Post: 11th February 2014, 00:12
  3. Rationale for Text Compression
    By cfeck in forum Data Compression
    Replies: 34
    Last Post: 20th November 2013, 04:43
  4. Replies: 32
    Last Post: 24th September 2013, 00:57
  5. Text Detection
    By Simon Berger in forum Data Compression
    Replies: 15
    Last Post: 30th May 2009, 09:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •