Results 1 to 13 of 13

Thread: Floating point compression

  1. #1
    Member
    Join Date
    Jul 2011
    Location
    India
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Floating point compression

    What are the best algorithms for Floating point data compression?

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Use an adaptive filter to predict the next value and encode the difference with an indirect order 0 context model.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Talking

    Quote Originally Posted by entropy View Post
    What are the best algorithms for Floating point data compression?
    Well that would be a function of the statistics of the numbers them selves. However suppose your
    floating point number system allows any 32bit combinations of bits to be a valid floating point number
    then in theory any file of any combination of 32 bit words is possible.
    Given that situation and knowing that most files thought of as 8 bit words since that is byte size. I would
    bijectively map the file to bytes size. This means most of the files of 32 bit words would be compressed
    since you skiped the files of odd length and skip half the files of even length.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Google finds this:
    http://www.cs.unc.edu/~isenburg/lcpfpv/
    http://www.csl.cornell.edu/~burtscher/research/FPC/

    Also what Matt says, but there're tricky moments, depending on the nature of specific FP data.
    For example, you might have many approximations of rational numbers (1/3 etc) stored as floats.
    Or floats stored as doubles, in which case there'd be some unused bits in their mantissa and exponent fields.
    Or some values with fixed _decimal_ format (eg. prices where you can't have precision below 1 cent).
    So I guess we can say that there's no universal float model, same as there's no universal data model.

    One obvious step is mixing of multiple such models, but the LPC approach usually prevents that
    (different filter models would have statistics for different values), also LPC is inherently redundant
    (because normally you won't bother to exclude delta values which cause float overflows and such),
    so, depending on data nature, a direct little-endian aligned bitwise CM might be actually better than
    a filter.

    Still, compression-wise, mixing of multiple such models, LPC or others, would be the best.
    For that we'd have to remap statistics gathered in context of specific submodels to a single representation.
    (For example, if we feed inverted bits to a submodel, we'd then have to invert its output probability to be
    able to mix it with normal predictions).

    Once I even made a model based on that kind of mixing and remapping: http://nishi.dreamhosters.com/u/FLTgeo3b1.rar
    geo file from Calgary Corpus has a somewhat unique format, but still its a kind of floats.

  5. #5
    Member
    Join Date
    Jul 2011
    Location
    India
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How is Matt's approach different from the FPC paper? Essentially doing the same thing as far as i can tell.

    And Lcpfpv paper is more focused only on geometric data. May not work with interleaved data. I have to try it.

    Capturing fractional numbers or decimal numbers will be very application specific but that information is lost if you just look at the FP data.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > How is Matt's approach different from the FPC paper?

    "indirect order 0 context model".

    Also "predict the next value and encode the difference" is not necessarily
    the same as LPC - for example, a binary context model can be used as
    a predictor.

    Matt probably meant LPC though - you can see it in paq's wav model.

    > Capturing fractional numbers or decimal numbers will be very
    > application specific but that information is lost if you just look
    > at the FP data.

    Its not really lost... actually it would be good if we could lose it.
    Detection is simple - we just have to convert given floats into that
    custom representation (1/i or i/100) and back, and see whether there's
    any difference with original value.
    I guess some small difference (a few lsbs of mantissa) can be allowed too,
    to compensate for different ascii-to-float implementations and such.

    In fact, its easy to detect some of these even visually:
    for example 1/3 = 3EAAAAAB, 1/5 = 3E4CCCCD etc
    These are surprisingly important eg. for executable compression, because
    modern compilers replace divisions by constant with multiplications.

    But also 0.33 = 3EA8F5C3, which is pretty bad for direct binary compression.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yes, I was thinking of LPC. But it depends on the data obviously. LPC would be appropriate for sensor measurements. But measurements are sometimes quantized so you may want to check if the data can be reduced to a small set of integers first, then use LPC on that.

    Otherwise you can use floating point calculations for the LPC but only if you are sure that the decompresser will use exactly equivalent floating point hardware. When encoding the prediction error, first map the predicted and actual values to integers (like *(int64_t*)&x) and then encode the difference so there is no precision loss when you convert back.

    I did some analysis of GEO in http://mattmahoney.net/dc/dce.html#Section_2
    It uses an obsolete IBM floating point format with a base 16 exponent instead of base 2. The data is also quantized to 16 bits.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > use floating point calculations for the LPC but only if you are sure that
    > the decompresser will use exactly equivalent floating point hardware

    Unfortunately that doesn't work at all, because of compiler quirks.
    I used to write float-point models before (eg. Ash is an example) and found that a FP model just has to be symmetric -
    ie same program code used for modelling both in encoding and decoding, or decoding would fail sometimes.
    Also such models are not compatible when compiled by different compilers (or even the same, but with different options).

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Yeah, I guess so. You might get different results with x87 and SSE2 even on the same machine. They also have flags to control rounding, underflow, etc.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    That's still ok, there're usually compiler options to control such things.
    But with floats a-b==0 doesn't mean a==b and so on - common sense axioms don't work,
    however compilers apply optimizations as if the precision is perfect.
    That's the main problem - i've seen cases where separately compiled encoder and decoder
    matched when i added a debug printf, but decoding failed without it.

  11. #11
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    On the tangent about using floating point in models:

    streflop vs gafferongames

  12. #12
    Member
    Join Date
    Aug 2011
    Location
    uk
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanx for info, guys)

  13. #13
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Exclamation Floating Point Compression

    Possible methods, that can be used for floating-point compression:

    Last edited by dnd; 14th June 2018 at 04:33.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •