Results 1 to 6 of 6

Thread: How to compress text bitmap?

  1. #1
    Member
    Join Date
    Aug 2017
    Location
    china
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Lightbulb How to compress text bitmap?

    Recently, I am doing jobs about Windows text compression; Windows text font use sub-pixel rendering tech (clearType ) which making the text font edge full of mess colors, like below.

    I have try the jpeg, zstd, lz4, the compression ratio of them cannot satisfied me ,SO anyone have experience or ideas to compress this type of bitmap, lossy or lossless?


    jpeg420dct org_size=6049648 compress_size=318570 ratio=18.9900 enc_time=10481 dec_time=10052
    jpeg444dct org_size=6049648 compress_size=460690 ratio=13.1317 enc_time=17369 dec_time=14275
    lz4+huffman org_size=6049648 compress_size=288399 ratio=20.9767 enc_time=5127 dec_time=2260
    zstd org_size=6049648 compress_size=233766 ratio=25.8791 enc_time=7487 dec_time=
    32bit_to_16bit+zstd org_size=3024824 compress_size=129402 ratio=23.3754 enc_time=

    I also try compress it as an 8-bit palette image since there aren't that many unique colors, but 32bit convert to 8bit waste too many times, anyone has new algorithm to solve it between speed and compression rate?

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Link to the question at StackOverflow , just for completeness

    Some comments on it:
    • ClearType can be configured, so you can't just create some dictionary/font that stores bitmaps for each character
    • Also, if a character is translated a bit in subpixel range, it will look slighty different, which might be the case (check the "cc" in the top row in your image) or not (the first "c" and the one in the next line look like they're the same)
    • Writing a specialized compressor would be the best, but since the ClearType algorithm isn't open source, you'll have to do a lot of reverse engineering or brute forcing to get near the optimal solution
    • Optimal solution would be: Text is stored in ASCII/Unicode together with the font name, some layout information if needed (line or character positions) and perhaps the used ClearType parameters
    • One thing that is noticeable is a certain color palette (shades of brown, blue and black) - color count isn't low enough for 8-bit values, but very close (e.g. the sample image has 1237 colors). Using FLIF often is a good solution for such images.
    • The colors have some spatial information (left of the character, brown is used, right of it, blue) and there always is a white-brown-gray/black-blue-white gradient when going from left to right.
    • Using HSV instead of RGB for the colors will most likely be useful as the tones will have similar hue and the saturation and value for a tone will correlate (since they are only fading to black or white).
    • If you allow lossy compression, try to convert the images to 8 bit before - this will reduce to only 256 colors, but when a good algorithm is used, the changes are not perceivable.
    • People won't recognize subtle ClearType differences, so another lossy way would be to use the first occurence of a character (e.g. the "c" in the first line) to replace all other occurences - or even more extreme, render the text using the same font but without ClearType


    It might be useful to upload a complete sample instead of the small crop above to get more realistic results from people here.

    For the cropped picture you uploaded so far, here are my results:

    Code:
    Lossless
    Algorithm   Size in bytes   Notes
    ---
    FLIF        16,326
    7-Zip bZip2 13,538           (first converted to 24-bit BMP)
    7-Zip zip   12,863           (first converted to 24-bit BMP)
    PNG         11,945 
    7-Zip lzma2  7,077           (first converted to 24-bit BMP)
    FLIF -N      6,664
    FLIF -N -R4  6,615
    paq8p -4     6,341           (first converted to BMP to trigger the image model, very slow)
    
    Lossy (first converted to 8-bit BMP)
    Algorithm   Size in bytes   Notes
    ---
    7-Zip zip    6,470
    PNG          5,993
    7-Zip bZip2  5,633
    FLIF         9,132
    7-Zip lzma2  4,053
    paq8p -4     3,568
    FLIF -N      3,419
    FLIF -N -R4  3,391
    Last edited by schnaader; 4th August 2017 at 16:56.
    http://schnaader.info
    Damn kids. They're all alike.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Actually HSV is probably not a good idea here - at least on windows this font anti-aliasing algo is very weird.
    Basically, it uses color components as subpixels directly. I guess with white background it looks good enough?
    Code:
    000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
    000000 000000 000000 000066 B6FFFF FFFFFF FFB666 000000 000000 000000
    000000 000000 000000 0066B6 FFFFB6 9090DB FFFFB6 660000 000000 000000
    000000 000000 000000 66B6FF FFB666 0066B6 FFFFFF B66600 000000 000000
    000000 000000 000066 B6FFFF B66600 000066 B6FFFF FFB666 000000 000000
    000000 000000 0066B6 FFFFDB 903A00 000000 66B6FF FFDB90 3A0000 000000
    000000 000000 66B6FF FFDB90 3A0000 000000 0066B6 FFFFDB 903A00 000000
    000000 000066 B6FFFF DB903A 000000 000000 000066 B6FFFF DB903A 000000
    000000 0066B6 FFFFDB 903A00 000000 000000 00003A 90DBFF FFDB90 3A0000
    000000 66B6FF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFDB 903A00
    000066 B6FFFF DB903A 000000 000000 000000 000000 003A90 DBFFFF DB903A
    0066B6 FFFFFF B66600 000000 000000 000000 000000 00003A 90DBFF FFDB90
    66B6FF FFFFB6 660000 000000 000000 000000 000000 000000 3A90DB FFFFDB
    000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
    So I'd suggest upscaling the image first, using something like https://en.wikipedia.org/wiki/Hqx#Algorithm
    (preferably a specially trained model).

  4. #4
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Here's a correctly aligned and downscaled version of the uploaded image, colors should exactly match the original:

    Click image for larger version. 

Name:	aligned_and_downscaled.png 
Views:	30 
Size:	4.1 KB 
ID:	5073

    The font most likely is Arial Unicode MS, here's a comparison with text from Paint (Arial, size 20) on my PC:

    Click image for larger version. 

Name:	comparison_with_arial_20_on_my_pc_large.png 
Views:	35 
Size:	17.0 KB 
ID:	5074
    http://schnaader.info
    Damn kids. They're all alike.

  5. #5
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by Shelwien View Post
    Actually HSV is probably not a good idea here - at least on windows this font anti-aliasing algo is very weird.
    Basically, it uses color components as subpixels directly.
    Hmm, I don't think this is the case at least for the sample image, there are 247 distinct hex byte values in it, should be much less in the case you described.
    But it matches the text I made with Paint which only has 27 different RGB colors and seems to have only 6 hex byte values: 0x00, 0x3A, 0x66, 0xB6, 0xDB and 0xFF
    http://schnaader.info
    Damn kids. They're all alike.

  6. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    cwebp -near_lossless 40 -q 100 -m 6 --> gives 3106 bytes

    zopflipng gives 3415 bytes

Similar Threads

  1. Storing the Number of all Word in a Bitmap
    By login-denied in forum Data Compression
    Replies: 1
    Last Post: 5th July 2016, 14:48
  2. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 16:31
  3. Text Detection
    By Simon Berger in forum Data Compression
    Replies: 15
    Last Post: 30th May 2009, 09:58
  4. Test set: bitmap
    By m^2 in forum Data Compression
    Replies: 28
    Last Post: 13th January 2009, 17:44
  5. A lil bitmap test...
    By jethro in forum Forum Archive
    Replies: 6
    Last Post: 24th June 2007, 23:25

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •