Results 1 to 9 of 9

Thread: Looking for compression algorithm with simply decompression

  1. #1
    Member
    Join Date
    Jul 2017
    Location
    Switzerland
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Looking for compression algorithm with simply decompression

    Hi,

    I'm a neuromorphic hardware designer trying to implement a compressed cache system on a digital chip as part of my PhD and I have been suggested to ask here for help. During the design, I noticed that the data I'm going to store, despite being composed of 16 bits words,have the following distribution:


    minimum number of bits to represent them | %
    1 | 20.1%
    2 | 12.7%
    3 | 22.5%
    4 | 29.0%
    5 | 14.7%
    6 | 0.9%
    7 | < 0.1%
    8 | < 0.1%
    9 | < 0.1%
    10 | < 0.1%
    11 | < 0.1%
    12 | < 0.1%
    13 | < 0.1%
    14 | < 0.1%
    15 | < 0.1%
    16 | < 0.1%



    The physical structure in which they are going to be stored can be seen as a set of small list of 9 words each

    L[0][0]= [0, 1, 2, 3, 4, 5, 6, 7, 8]
    L[0][1]= [0, 1, 2, 3, 4, 5, 6, 7, 8]
    L[0][2]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
    ....
    L[0][7]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
    L[1][0]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
    L[1][1]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
    ...
    L[N][7]=[0, 1, 2, 3, 4, 5, 6, 7, 8]

    at time step 0, the system is going to read, in parallel, the first values of the 8 lists composing L[K], one from list (so L[K][0][0],L[K][1][0],L[K][2][0],L[K][3][0],...)
    at time step 1, the second values of the lists composing L[K] are read(L[K][0][1],L[K][1][1],L[K][2][1],L[K][3][1],...)
    at time step 2, the third values

    it continues until 9 values are read for each list(L[K][0][8],L[K][1][8],L[K][2][8],L[K][3][8],...), then it starts to read from another L[K]

    I'm trying to reduce the number of bits composing each 9-words list using some compression technique. The technique can be very slow and complex during compression phase (it is performed offline) but must be very simply for decompression (to avoid hardware overhead).

    Since it is done in HW, it has some extra "unusual" constrain:
    -Since we can't change the physical size of the memory cells, the resulting total number of bits required for each list is going to be rounded up to the closest multiple of 8 (e.g. a compressed string of 17 bits is going to become 24 bits long)
    -It would be better if the lists are stored independently (without creating a huge 81 words list) unless it gives a really strong advantage (> 10%) in terms of compression.
    -We can have shared information for the decoding (e.g. storing the number of bits of the longest word in the 8 lists)
    -The decompression can't have multiplications or divisions, only shifts, adds, subs, and, or or large memories (e.g. no large or dynamic dictionaries) since there are going to be 512 decoders implemented on chip

    I tried several techniques already, with following resulting bits/word:

    -exp golomb coding: 7.76
    -fibonacci coding: 6.73
    -rice coding: 5.57
    -ANS (2 bits symbol)** 11.35 (a lot higher than I was expecting? I have checked the implementation and it seems fine...)
    -ANS (1 bit symbol)** 10.86
    -shared size* 5.24

    **probabilities rounded to closest power of 2 to simplify decoding in HW
    *The data are rectified (2*x if positive, -2*x-1 if negative and a reference value X is subtracted from all of them). For each group of entries L[K][0][Y],L[K][1][Y],L[K][2][0],L[K][3][Y]... the number of bits required by the longest word of the group is stored within the data


    Before proceeding with the implementation of the current best solution, I would really appreciate any suggestion for algorithms or whatsoever other useful information/paper/etc&

    Thank you!
    Last edited by asa; 19th July 2017 at 22:15. Reason: table formatting

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Well, here's an arithmetic coder without multiplications of divisions (via exp/log LUTs): http://nishi.dreamhosters.com/u/rc_v1.rar
    And this version has FSM counters: http://nishi.dreamhosters.com/u/rc_v5.rar
    So you can basically port any statistical model (paq etc) to be mul/div-free.

    Interleaved coding recently became popular in PC data compression too (zstd etc),
    so it shouldn't be a problem - its compatible both with AC and ANS.

    Also, both AC and ANS should provide compression to entropy limit -
    for example, its possible to simulate golomb/rice coding, if you'd assign probabilities right.
    But its hard to say anything about best model for the data without the data.

    As to compression optimization for offline encoding, the common methods are parsing optimization
    and parameter optimization.
    For example of the latter, we can take static probability tables, then mix their predictions with
    an adaptive mixer. In this case, mixer would have some parameters (update rate etc), which can be tuned
    for specific data.

  3. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    While not technically the best solution, 16-bit values can sometimes be compressed well simply by splitting into top 8 bits and bottom 8 bits and then using existing standard algorithms on them. It's crude, but may work well so worth a punt.

    If the data has not predictability between values then basic entropy encoding is sufficient. You've demonstrated a distribution around a mean point already with the "shared size". You can use this to convert your values to and then apply the other techniques again, eg rice or golomb coding. If it's not data randomly sampled from a distribution, but with some correlation to previous data, then start with the simplest test of delta encoding (subtract from previous value) and work from there.

    Finally, frankly the easiest way of learning if your data warrants a lot of analysis first or is sufficient to just shove through a basic entropy encoder, is to pass it through a variety of the heavyweight tools - paq8, cmix, emma, etc. I'm not suggesting you actually use them in production, but knowing how well they work can help direct your research. If they only shrink 10% better than your current methods then it's likely there is little correlation and/or modelling to be done.

  4. #4
    Member
    Join Date
    Jul 2017
    Location
    Switzerland
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thank you, I'm going to see what comes out from your suggestion. In the meanwhile, I had this idea: since the sequences are short (9 values) is there any "generative model" that can be used? In electronics for example are popular LFSR for generating test data, is there any algorithm that can initialize a LFSR-like structure and this will provide a value at each step? From my limited-knowledge point of view, it should be somehow similar to AC and ANS, no?

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Well, there's https://en.wikipedia.org/wiki/Linear_predictive_coding .
    And in general, you can use some SAT solver to compute parameters for some PRNG and generate your data.
    But this kind of transformation would only improve compression if code length of original data is longer than
    code length of generator parameters (plus code length of residuals if generator is not precise).

    In any case, this approach doesn't work for random data. For example, if you try using some LFSR-based function
    to predict audio data, the parameters for that function would be likely much less compressible than original data.
    So there's no universal solution - you just need to do a statistical analysis of your data and build a model based on that.

    Also, generative schemes are mostly used as a speed optimization.
    Computing optimal LPC coefs for a block, then processing the block with a static filter
    results in faster decoding than computing optimal LPC coefs after each coding step,
    but compression is better in the latter case.

    And anyway, in the end, you'd just have to compress generator coefs instead of original data.

    It would help if you'd post some data samples.

  6. #6
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    You can try to use methods used for lightweight Integer Compression, possibly with Zigzag, Delta or Xor encoding.
    - Variable nibble, variable byte, or variable n-bits
    - Simple family like vsimple or "simple 9" (similar to shared size).
    It is also possible to compress the selectors (or sizes) using RLE for example
    - TurboPFor scheme
    - Extra-bits like in Deflate

  7. #7
    Member
    Join Date
    Jul 2017
    Location
    Switzerland
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Well, there's https://en.wikipedia.org/wiki/Linear_predictive_coding .
    And in general, you can use some SAT solver to compute parameters for some PRNG and generate your data.
    But this kind of transformation would only improve compression if code length of original data is longer than
    code length of generator parameters (plus code length of residuals if generator is not precise).

    In any case, this approach doesn't work for random data. For example, if you try using some LFSR-based function
    to predict audio data, the parameters for that function would be likely much less compressible than original data.
    So there's no universal solution - you just need to do a statistical analysis of your data and build a model based on that.

    Also, generative schemes are mostly used as a speed optimization.
    Computing optimal LPC coefs for a block, then processing the block with a static filter
    results in faster decoding than computing optimal LPC coefs after each coding step,
    but compression is better in the latter case.

    And anyway, in the end, you'd just have to compress generator coefs instead of original data.

    It would help if you'd post some data samples.
    Ok yes I see the point.
    Here is a sample of data (it is txt, each line is a different entry to be read as 16 bit signed integer), thank you!

    https://drive.google.com/file/d/0B0p...ew?usp=sharing

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Sure looks similar to audio data.
    I guess you can optimize a static LPC filter (or several) - multiplications by constant can be approximated well enough by shifts and additions.
    And then encode the best filter choice for each 9 samples, or some such.
    As to entropy coding, lossless audio formats frequently have rice coding or such, but it could be interesting if you'd use that mul-free AC.

    Code:
    6,161,967 weights_cnn_layer_12.txt
    4,718,592 weights_cnn_layer_12.bin // 1.exe
    1,367,212 1.7z // 7z a -mx=9 -myx=9 1 weights_cnn_layer_12.bin
    1,213,105 weights_cnn_layer_12.ofr // ofr.exe --encode --raw --sampletype SINT16 --channelconfig MONO --seek min --preset 4 weights_cnn_layer_12.bin 
    1,210,365 weights_cnn_layer_12.bin.paq8px // paq8px_v79 -7
    1,205,940 weights_cnn_layer_12.bin.paq8pxd18 // paq8pxd_v18 -s7
    1,204,062 weights_cnn_layer_12.lea // emma audio-max
    In bits/word this would be... 1204062*8/(4718592/2) = 4.08?
    Attached Files Attached Files
    • File Type: exe 1d.exe (2.5 KB, 17 views)
    • File Type: exe 1.exe (3.0 KB, 22 views)
    • File Type: c 1d.c (362 Bytes, 22 views)
    • File Type: c 1.c (375 Bytes, 26 views)

  9. #9
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    It appears you only have 8 bit data present here. You could use an escape code to embed the very occasional 16-bit element if they're present but that infrequent.

    So working on that I get the following perl 1-liners to convert.

    Code:
    perl -e '$/=undef;print pack("c*",split("\n",<>))' weights_cnn_layer_12.txt > a.bin
    perl -e '$/=undef;@a=unpack("c*", <>);$"="\n";print "@a\n"' < a.bin > a.txt
    In terms of basic entropy of the data:

    Code:
    len = 2359296 bytes, entropy8o0 = 1329733.119624, 4.508915 bits per byte
    len = 2359296 bytes, entropy8o1 = 1213659.440753, 4.115327 bits per byte
    So a quick entropy encoder is going to get you close to those figures. The rANS I have produces 1333493 (order-0) and 1221892 (order-1), but any decent entropy encoder will be in the similar ball park. Given the above (and slow) codecs listed by Shelwien, I suspect there is little predictability in the signal and so a fast crude algorithm is likely sufficient.

    Edit: I tried ppmd and o2 onwards gives poorer compression, so correlation between successive values is very limited. Fixed frequency order-1 statistics looks like the quick route, or just doing successive delta of integer values and doing a basic order-0 encoding or trivial integer bit encoding methods (rice etc). Delta does work and gives theoretical entropy of 4.31 bits per byte - not as good as order-1, but close enough.

  10. The Following User Says Thank You to JamesB For This Useful Post:

    Shelwien (21st July 2017)

Similar Threads

  1. [LZ] M/T & GPU compression/decompression
    By Bulat Ziganshin in forum Data Compression
    Replies: 12
    Last Post: 1st April 2017, 23:29
  2. Replies: 1
    Last Post: 1st February 2015, 06:17
  3. Very slow compression with very fast decompression?
    By Mangix in forum Data Compression
    Replies: 16
    Last Post: 12th September 2013, 02:25
  4. Google released Snappy compression/decompression library
    By Sportman in forum Data Compression
    Replies: 11
    Last Post: 16th May 2011, 12:31
  5. decompression is ~1000 times slower than compression?
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 17
    Last Post: 28th April 2010, 13:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •