Page 1 of 3 123 LastLast
Results 1 to 30 of 66

Thread: FLASHZIP new ARI+LZP compressor

  1. #1
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    FLASHZIP rev.0.01
    Copyright ® 2008 by Nania Francesco A
    INSTRUCTIONS :
    To compress FlashZIP -c in out
    To decompress FlazhZIP -d in out
    ** [10.1.2008] V. 0.1 Coming new Fast compressor!
    link:
    http://www.winturtle.netsons.org/FLASHZIP.zip

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Francesco!

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick test...

    Test machine: AMD Sempron 2400+ with Windows XP SP2

    FLASHZIP Copyright ® 2008 by Nania Francesco Antonio (Italy).
    Default
    Compress 100000000 to 34053194 bytes time 17.20 5676 KB/s

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Another quick test...

    A10.jpg > 842,029
    AcroRd32.exe > 1,900,776
    english.dic > 903,765
    FlashMX.pdf > 3,915,570
    FP.LOG > 932,331
    MSO97.DLL > 2,249,419
    ohs.doc > 998,572
    rafale.bmp > 1,312,917
    vcfiu.hlp > 1,005,850
    world95.txt > 837,486

    Total = 14,898,715 bytes

  5. #5
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks Love Pimple!
    Hi!
    Is faster or not ?

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Its very quick! I will try more tests later.

  7. #7
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    OK Thanks!
    FLASHZIP vs THOR 0.96A
    ENWIK8
    FLASHZIP
    34053194 bytes time 3.88 25201 KB/s
    THOR e4
    35795184 bytes 4sec 859msec 19.63 MB/sec

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick speed test...

    Timed with AcuTimer v1.2

    Test File: ENWIK8 (100,000,000 bytes)


    THOR v0.96a (E4)

    Elapsed Time: 00:00:14.304 (14.304 Seconds)

    Compressed Size: 34.1 MB (35,795,184 bytes)


    PKZIP v2.50 Win32 (-max)

    Elapsed Time: 00:00:16.035 (16.035 Seconds)

    Compressed Size: 34.7 MB (36,432,708 bytes)


    FLASHZIP v0.01

    Elapsed Time: 00:00:17.292 (17.292 Seconds)

    Compressed Size: 32.4 MB (34,053,198 bytes)


    THOR v0.96a (E5)

    Elapsed Time: 00:00:20.042 (20.042 Seconds)

    Compressed Size: 34.0 MB (35,696,032 bytes)


    QUAD v1.12

    Elapsed Time: 00:01:07.286 (67.286 Seconds)

    Compressed Size: 28.2 MB (29,649,604 bytes)

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Some more stats, CeleronM 1.7
    http://shelwien.googlepages.com/fzip001.htm
    Btw, I'd written some perl scripts that automatically generated this.
    So I can upload them (maybe include perl binaries even) if anyone would like to use something like that.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I also have an idea about new fast compressor. That is, an improved/lite version of LZPX. If combine a better hash function (multiplicative hash), with FPAQ0P like encoder, plus other stuff... However, for me no reason to do that - LZPM will outperform such things in both compression ratio and decompression time.

  11. #11
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    As I have said I don't already use in this presser the arithmetic engine similar to that of Matt Mahoney but a similar engine more to that of PPMD or of Quad to understand better. The implementations are anchors so many to do I hope!

  12. #12
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    What about a algorithm describtion?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  13. #13
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I Use a modification of Adaptive Huffman Code in order 1 Byte ! Is difficult to write a pseudo code and the arithmetic engine at moment is not stable entirely!

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I also have an idea about new fast compressor. That is, an
    > improved/lite version of LZPX. If combine a better hash function
    > (multiplicative hash),

    Actually, the best hash for context is its entropy code...
    multiplication-based too, obviously. Like, you find
    probabilities for a static model on a first pass, and
    then use it as a hash for more complex adaptive model
    on a second pass. Of course it slows compression, but
    decompression could be significantly faster like that.

    > with FPAQ0P like encoder, plus other stuff...
    > However, for me no reason to do that - LZPM will outperform such things
    > in both compression ratio and decompression time.

    Really... Why don't anyone work on better modelling anymore?
    Even Matt seems more concerned about speed now...
    As if theory is perfect already...
    Although in fact its fairly easy to get a better result than paq8's
    on any data type with known structure. And I bet its possible
    for text as well, though proper text modelling takes too much work.

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    I made a few attempts to improve modeling in my LZPM... Currently LZPM uses:
    Code:
    class TPredictor { 
    private: 
      unsigned short p1; 
      unsigned short p2; 
     
    public: 
      TPredictor(): p1(1 << 15), p2(1 << 15) {} 
     
      int P() { 
        return ((p1 + p2) >> 6); 
      } 
     
      void Update(int y) { 
        if (y) { 
          p1 += ((unsigned short)~p1 >> 3); 
          p2 += ((unsigned short)~p2 >> 6); 
        } 
        else { 
          p1 -= (p1 >> 3); 
          p2 -= (p2 >> 6); 
        } 
      } 
    };
    It's slightly slower than version with just one probability. Mostly due to double memory use.
    What if add version with one probability combined with APM/SSE? Looking at your mixer and latest PAQ9 I see that it's possible to do a fast weighing. However, I tried your mixer with my LZPM, the compression is just a slightly better compared to the dump (a+b)/2. Or maybe I should add a table driven update like pr = tab[pr][y]?

  16. #16
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Have you treid to mix diffrent model's predictions? Mixing under the same context won't be a big deal (you just distingush between more or less stationary).
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  17. #17
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I think as so many that date by now the compression has arrived to such a point that ZIP,RAR and 7ZIP have monopolized everything! I believe nevertheless that the problem is really tied up to the fact that any firm or great company are wanted to put in an already so difficult market! The search that we am doing is also very important however because in a future the algorithms could be used for a new codec video or for the stream-video HDTV for instance! Who knows also it to compress audio!

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I made a few attempts to improve modeling in my LZPM...

    Well, I was talking about "better than paq8" modelling,
    not the LZ kind. Though, I guess, good techniques can
    be applied anywhere.

    ...And anywhere quality level is kinda the same.
    Like, there's no known perfect counter (most precise,
    not fastest), no good tools for correlation analysis,
    no perfect implementation of combining predictions of
    several models, totally no examples of good update
    functions for primary submodels whose predictions do
    not linearly affect the code length, etc etc.

    Code:
     
    >     if (y) { 
    >       p1 += ((unsigned short)~p1 >> 3); 
    >       p2 += ((unsigned short)~p2 >> 6); 
    >     } 
    >     else { 
    >       p1 -= (p1 >> 3); 
    >       p2 -= (p2 >> 6); 
    >     }

    > It's slightly slower than version with just one probability. Mostly due to
    > double memory use.
    > What if add version with one probability combined with APM/SSE?

    SSE is always good, at least for compression level.
    At least when its proper SSE and not the kind used in paq

    > Looking at your
    > mixer and latest PAQ9 I see that it's possible to do a fast weighing. However, I
    > tried your mixer with my LZPM, the compression is just a slightly better
    > compared to the dump (a+b)/2.

    That's probably because its not that kind of statistics.
    Normally, this trick (dynamic mixing of two counters with
    different update speeds) is used when there're alternating
    areas with different level of nonstationarity, but no better way
    to detect which kind of area is it currently.

    And in your case it might be fairly stationary... then
    static mix of two counters works better than any of them
    probably because optimal update speed value is not power of 2.

    So in your place I'd try to rewrite the update like this:

    Code:
     
    SCALElog = 16; 
    SCALE = 1<<SCALElog; 
    mSCALE = SCALE-1; 
    Speed = 8; 
    wr1 = SCALE/Speed; 
    p1 = p1*(SCALE-wr1) + y*mSCALE*wr1; 
    p1 >>= SCALElog;

    Ensure that initially it gives the same compression as yours
    (though slower of course).
    And then test it with all values of "Speed" from 8 to 64.

    Quite possible that with some optimal value you'd get
    the same or better compression level like with two counters,
    but with a single counter.

    Though I didn't really think about speed optimization after this

    > Or maybe I should add a table driven update like
    > pr = tab[pr][y]?

    Maybe, if you'd find that 8 bits precision per counter is enough...
    which it isn't.

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Quote Originally Posted by Shelwien
    > Or maybe I should add a table driven update like
    > pr = tab[pr][y]?

    Maybe, if youd find that 8 bits precision per counter is enough...
    which it isnt.
    in TarsaLZP im using something like this:

    pr += tab[pr >> 8][y];

    pr is a 16 bit value, but im using only high 8 bits of it (to speed up things, with 8 bit probabilty i will have at most one renormalization). and im using tables optimized for lzp+ sse, which gives me 0.5 % improvement on reduntant files like fp.log over normal update.

    my sse tables are very small (8 * 256 entries, but most of them are rarely used), so after few kilobytes of input data they are very warm so i can use fixed speed of adaptation without much compression hit (maybe one kilobyte per file).

    and as an input to sse im using 8- bit states (as in lpaq) and i think that 8- bit is enough for that. with bigger variables values would be much more unstable and adaptation will be slower.

    big precision only helps with very reduntant files, with compression ratio over 100 : 1.

  20. #20
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    can also be improved as because have made a will him with my filters and is excellent! I think that if, you intend TarsalLZP OpenSource you should do I strive him/it to entirely translate him/it in C++ and you/he/she could be improved together then in 20-40% compression with the addition of a best coding Wave-Bmp-Exe-Tiff. If you want Donkey7, passing him/it in C++ can lend you a hand!

  21. #21
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    i've described lzp+ sse stuff here: http://phpfi.com/272692 for you some time ago. and it's written in c++.

    i haven't described my fixed order- 2 literal coder because it's dumb and inefficient. it's just an array of 16777216 probabilities (256 symbol * 65536 contexts). almost surely you have better thing in your turtle.

    TarsaLZP is combination of LZP+ SSE (for matched bytes) and fixed order- 2 PPM (for unmatched bytes).

    i don't have time now (end of semester). on holidays (winter or summer) i will have more spare time to work on it.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > pr += tab[pr >> 8][y];
    > pr is a 16 bit value, but i'm using only high 8 bits of it (to speed up things,
    > with 8 bit probabilty i will have at most one renormalization).

    Why, make a 16bit renormalization in rangecoder

    > and i'm using tables optimized for lzp+ sse,
    > which gives me 0.5 % improvement on reduntant
    > files like fp.log over normal update.

    I suppose its similar to how contextual compressors with lossy hash
    mapping of several contexts into same counter seemingly compress
    better than same model with full statistics.
    Actually, better precision is _always_ better.
    Simply because you can always simulate worse precision with it.

    > my sse tables are very small (8 * 256 entries, but most of them are rarely
    > used),

    is that interpolated SSE at least?

    > so after few kilobytes of input data they are very warm so i can use
    > fixed speed of adaptation without much compression hit
    > (maybe one kilobyte per file).

    Well, dunno what are you comparing it with.
    Same as with precision, its always possible to get some improvement
    with more statistics and more submodels. And comparing LZP even with
    PPMd, there's much space for improvement.

    > and as an input to sse i'm using 8- bit states (as in lpaq) and i think that 8-
    > bit is enough for that. with bigger variables values would be much more
    > unstable and adaptation will be slower.

    Guess you're doing something wrong... Its like how PPMd can't use
    higher-order contexts. But it doesn't mean that they're unusable,
    just the model is faulty.

    > big precision only helps with very reduntant files,
    > with compression ratio over 100 : 1.

    Well, I can't agree here.
    Actually, almost any compressible file can be considered redundant
    like that - its just that there're contexts with good compression
    ratio and ones with bad.

    Let's suppose that data source is a algorithm which uses some
    amount of true random information and some fixed rules to change its
    state and generate the data.
    Then, ideal compressor for that source is supposed to recover
    exactly the used amount of random information from the data and
    store it (and use the same generator for decompression).
    So, even with not visibly redundant files (like pseudorandom streams
    from Ilya's prng) you'd still need all the precision you can afford.

    Ok, with most of the real files we don't actually know exactly how
    they're generated. But still, its always possible to find
    some almost-deterministic contexts in compressible data.
    And also you need good precision to process random data or you'd get
    high redundancy.

    So, afaik, its best to have random and deterministic statistics
    separated as much as possible. Of course, its faster to have
    huffman-like state/symbol hierarchy, but to implement a correct
    compressor that way you need a really good knowledge of data
    properties - and even more precision as you'd have to manage
    these compound states, where eg. a proper update is much more
    complex than usually.

  23. #23
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Quote Originally Posted by Shelwien
    Why, make a 16bit renormalization in rangecoder
    hmm, then i will be left with 16- bit precision rangecoder (i dont want to use 64 bit variables, theyre slow) and this will have big impact on literal coding (which is not bijective, theres 255 symbols in statisitcs) - etiher compression (because of less precision) or speed (i will be forced to use more complex routines to compensate rounding errors).

    Quote Originally Posted by Shelwien
    I suppose its similar to how contextual compressors with lossy hash
    mapping of several contexts into same counter seemingly compress
    better than same model with full statistics.
    Actually, better precision is _always_ better.
    Simply because you can always simulate worse precision with it.
    thats certainly not true. ive tested it and it works. its used only in sse part. sse is not hashed - it uses 3 bit of match history + 8 bit of context state for total of 11 bits (== 2048 entries).

    by using tables i emulated fpaq0p behaviour but ive tuned it somewhat. ive explained the difference in article ive linked above:
    // creating probability transition tables. they are used when updating probabilities and usually act like updating by 2.5 % (1/ 40th) of the error.
    // but it is updating less agressively when statistics are skewed and this helps on skewed statistics (eg. on files like fp.log it is 0.5 % better)

    in other words - when probabilty is like 0.01 or 0.99 then we update less aggresively.

    hash mapping is always lossy (because hash is lossy by definiton). but the error in lzp model isnt very important - with collision detection i gain very little with big speed loss http://www.encode.ru/forums/index.ph...page=1#msg5204

    Quote Originally Posted by Shelwien
    is that interpolated SSE at least?
    no. i know with interpolation i will benefit from bigger tables but i wanted speed.

    Quote Originally Posted by Shelwien
    Guess youre doing something wrong... Its like how PPMd cant use
    higher-order contexts. But it doesnt mean that theyre unusable,
    just the model is faulty.
    as i written above i dont use interpolation or propagation. just simple point- to- point updates (ie. when processing one byte with one lzp model i check and change only one entry from lzp and one entry from sse, lzp models are indepedent). with some mixing, propagating errors, weighting, etc. it will compress a little better but much slower.

    using big precision sometimes means dealing with noise. with small precision you filter the noise automatically.

    see your post here: http://www.encode.ru/forums/index.ph...ic=586#msg7095
    To be specific, I have results like this:
    370648 370607 // 15 bits
    370652 370607 // 12 bits
    370646 370628 // 11 bits
    371075 370708 // 8 bits
    And its not about simple change of shift value - complete
    model reoptimization (contexts and other parameters) was
    performed for objective comparison (numbers in first column
    are results before optimization).
    So I think that you need 12bit precision, but anything more
    is masked by the "model noise".

    TarsaLZP, FlashZIP, PPMdJ are all about efficiency not super high compression ratio.

    my main goal is efficient compression of textual data (human readable and understandeable like plain text, html, source codes, charts, etc. not base64 encoded data, dna sequences or random characters). lzp is a best solution to that, although its terrible on binary data.

  24. #24
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    @Donkey7

    Excuse me but in that sources tails it is not understood really a lot any arithmetic engine and there are besides some varying ones that it is not understood where you are been initialized. how example:

    const word floor = 40;
    const word length = 500;
    byte a, b;
    word pmiss[256], pmatch[256], prob;
    // for Donkey7
    A=?????
    B=????
    for (a = 0; a < 256; a++)
    {
    if (a == 0)
    b = 0;
    else if (a < 12
    {
    b = length / a;
    if (b < floor)
    b = floor;
    }
    else if (a == 12
    b = floor;
    else
    {
    b = length / (256 - a)
    if (b < floor)
    b = floor;
    }
    if (a != 255)
    pmatch[i] = (256 - a) * 256 / b; // for Donkey7 i=????
    else
    pmatch[i] = 0; // for Donkey7 i=????
    if (a != 1)
    pmiss[i] = a * 256 / b;// for Donkey7 i=????
    else
    pmiss[i] = 0;// for Donkey7 i=????
    }
    // after a match we'll do:
    prob += pmatch[prob >> 8];
    // after a miss we'll do:
    prob -= pmiss[prob >> 8];

    Engine is equal to Fpaq0 (Write Bit) ?

  25. #25
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Nania Francesco Antonio
    sorry it should be pmiss[a] not pmiss[i] (and pmatch[a] not pmatch[i]).

    yes, it's almost equal to fpaq0p. it's only slightly tuned.

    so: a = index (element to be computed), b = speed of adaptation (different for different probabilities).

    you can skip that code. it's rather irrelevant. it's only about tuning fpaq0p coder.

    edit/ update:
    b is an inverse of speed. eg. if b = 40 then probability is updated by 1/ 40' th of error.

  26. #26
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks Donkey7!

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    >> Why, make a 16bit renormalization in rangecoder
    >
    > hmm, then i will be left with 16- bit precision rangecoder (i don't want to use
    > 64 bit variables, they're slow) and this will have big impact on literal coding
    > (which is not bijective, there's 255 symbols in statisitcs) - etiher compression
    > (because of less precision) or speed (i will be forced to use more complex
    > routines to compensate rounding errors).

    Well, here's a working example in my demo coder:
    http://shelwien.googlepages.com/order_test4.rar

    Btw, although it uses qword type casts there, its a speed optimization
    actually as it seems that IC 10 is finally able to use the common
    32x32->64 x86 MUL instruction - before o2_rc1 all was done in 32 bits.

    >> I suppose its similar to how contextual compressors with lossy hash
    >> mapping of several contexts into same counter seemingly compress
    >> better than same model with full statistics.
    >> Actually, better precision is _always_ better.
    >> Simply because you can always simulate worse precision with it.
    >
    > that's certainly not true. i've tested it and it works. it's used only in sse
    > part. sse is not hashed - it uses 3 bit of match history + 8 bit of context
    > state for total of 11 bits (== 2048 entries).

    you apparently didn't understand me, but whatever.

    > in other words - when probabilty is like 0.01 or 0.99 then we update less
    > aggresively.

    Wonder how you find optimal parameters for these updates.
    Do you have any specific tools or maths for that task?

    > hash mapping is always lossy (because hash is lossy by definiton). but the error
    > in lzp model isn't very important - with collision detection i gain very little
    > with big speed loss

    I didn't argue with that. I just said that as an example that you can't
    simply replace one part of the tuned model with a better (more precise
    etc) version and hope that whole model will work better.
    So its kinda troublesome to prove my point.

    > > is that interpolated SSE at least?
    >
    > no. i know with interpolation i will benefit from bigger tables but i wanted speed.

    No, its the reverse. Properly implemented SSE-with-interpolation allows
    to greatly reduce the number of quants. Eg. in my coders a single map
    consists of 4 to 6 values and thats enough.
    Of course it takes more computations, but the model might get even
    faster overall due to fewer RAM access etc.

    > as i written above i don't use interpolation or propagation. just simple point-
    > to- point updates (ie. when processing one byte with one lzp model i check and
    > change only one entry from lzp and one entry from sse, lzp models are
    > indepedent). with some mixing, propagating errors, weighting, etc. it will
    > compress a little better but much slower.

    So, how do you determine where's the maximum compression level possible
    for LZ and why do you think you reached it?
    There's not that much difference between CM and LZ afaik.
    If you start speed optimization of CM it'll start looking like LZ, and
    vice versa.

    > using big precision sometimes means dealing with noise. with small precision you
    > filter the noise automatically.

    Well, I wanted to say that such things should be done in a controllable
    fashion, it gives more parameters which you can tune.

    > To be specific, I have results like this:
    > 370648 370607 // 15 bits
    > 370652 370607 // 12 bits
    > 370646 370628 // 11 bits
    > 371075 370708 // 8 bits
    > And its not about simple change of shift value - complete
    > model reoptimization (contexts and other parameters) was
    > performed for objective comparison (numbers in first column
    > are results before optimization).
    > So I think that you need 12bit precision, but anything more
    > is masked by the "model noise".

    Right, but that's kinda out of context here.
    That was not a normal counter, more like a bit sequence hash
    or something. For normal counters I prefer better precision.
    And also these "abnormal counters" was used not directly, but
    as a index to interpolated SSE with 16bit coefficients.
    That is, I tried to shrink down my counters in best possible
    circumstances.

    But in most "fast compressors" these things are done in reverse order.
    That is, first you implement 1-byte counters with fast and rough update
    function, then try some magic to get better predictions.

    Well, not that I care anyway. If it wasn't like that, there'd be nothing
    to do in this field already.

    > my main goal is efficient compression of textual data (human readable and
    > understandeable like plain text, html, source codes, charts, etc. not base64
    > encoded data, dna sequences or random characters). lzp is a best solution to th
    > at, although it's terrible on binary data.

    Afair, your result on book1 was like 248k or something.
    Dunno how's that efficient.
    LZ is the best solution for the data with long matches.
    That's all.
    And texts are not like that.
    Even more so if you'd like to use some preprocessing.

  28. #28
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    FLASHZIP V. 0.02 Copyright ® 2008 by Nania Francesco A
    -More faster!
    -Removed Bugs!
    -Full Compatible with old version!

    Download Link:
    http://www.winturtle.netsons.org/FLASHZIP.zip

  29. #29
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Shelwien
    TarsaLZP is not finished yet. it lacks proper literal coding. on files where lzp model doesn't match many symbols my program performs very bad (worse than even zip). so, on some files it can be worse than ppmdj -o2 and on some it can be better than ppmdj -o16 and that's normal. i don't use normal lz. my lzp model is more like high order ppm (with only one symbol per context).

    when/ if i release new versions of my program (with proper literal coding) i will consult you to exchange some ideas, but for now discussing the performance of my program is pointless.

  30. #30
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Nania Francesco Antonio
    FLASHZIP V. 0.02 Copyright ® 2008 by Nania Francesco A
    -More faster!
    -Removed Bugs!
    -Full Compatible with old version!

    Download Link:
    http://www.winturtle.netsons.org/FLASHZIP.zip
    Thanks Francesco!

Page 1 of 3 123 LastLast

Similar Threads

  1. FLASHZIP file Compressor!
    By Nania Francesco in forum Data Compression
    Replies: 170
    Last Post: 7th December 2013, 00:35
  2. LZP flag sequence compression
    By Shelwien in forum Data Compression
    Replies: 8
    Last Post: 9th August 2009, 02:08
  3. flzp, new LZP compressor/preprocessor
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 23rd June 2008, 17:24

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •