Page 1 of 2 12 LastLast
Results 1 to 30 of 36

Thread: Discussion initiated by a amateur

  1. #1
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Question Discussion initiated by a amateur

    Hello everyone.

    This time I want to find the substantial difference among the 3 most appeared LZ-Algorithm , LZ77 ROLZ and LZP. In my opinion, the difference among them seems not so much. Though ROLZ reduced Offset, the encoder would output the two more literal char (if order-2 hashing). If we output the non-reduced offset instead of the two char, it seems to be LZ77. The offsets space can also expand 65536 times. And then the decoder will be faster since it needn't keep the hashlist. But ROLZ is likely easier to implement than LZ77. The match finder is only loops without further strategy (I've only read the Balz src).



    LZP is more simple than ROLZ, though the inventor (C Bloom?) 's idea is about "predictive" and treated it as a new algorithm. In fact it is a much simplified match finder with the hash list's length of ONE. LZP3 using combined order prediction is to avoid forget the match too much but slows down. In my csc2 compressor if I use only order-2 LZP it will speed up to 14M/s(7s on enwik but the ratio became 38% instead of 34%. The procedure of decompression is basiclly symmetric to the compression. So it is much slower than LZ77's decompression.



    It's only my comprehension so if it have problems, just correct me! I've said I'm planning a new practical compressor (aimming at 30s by 28% on enwik. I'm considering which to choose.



    best regards!

  2. #2
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile

    Say sth. plz~! Several experienced compressor developers or experts online. I want to hear your ideas and learn sth. from you!

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Say sth. plz~! Several experienced compressor developers
    > or experts online. I want to hear your ideas and learn
    > sth. from you!

    1. I'm just not really interested (or experienced, on
    that matter) in this matter, and don't like to encourage
    LZ development either. LZ compressors are only applicable
    for 1-D data compression, while a lot of CM code is reusable for
    any kinds of tasks involving modelling - from stock prices
    prediction to OCR, that's aside from compression of structured
    data.

    Also "encode" is busy with job hunting or something, and
    toffer and osmanturan are not into LZ either.
    Which doesn't leave you much choice, as all others barely
    talk about anything technical at all.

    There's one kind of LZ though, which I'd really like to see
    built and working - a CM based on entropy estimation of LZ matches,
    preferably with some popular LZ, like deflate or LZMA.
    But that's only out of curiosity as to what's exactly the
    amount of LZ's multiple encoding redundancy.

    2. I don't think that LZ is any simpler than CM actually.
    To compete with existing implementations, you'd need to
    implement at least optimal parsing, a really optimized rc,
    and some multithreading features.
    While with CM its still probably possible to get impressive
    results without much programming.

    3. There's no clear boundaries between implementations -
    there're things like CM with LZP submodel and the like.
    So its very hard to choose what's "most promising" just
    by a basic algorithm idea - it all depends on amount of
    work invested.
    Though I'd certainly start with a generic match finder,
    able to find any matches in the data, and choose the
    specific algorithm details based on analysis of its
    performance.
    So basically, first a good (maybe slow) LZ77 compressor
    has to be built, to provide entropy estimations for
    such analysis.
    And then you'd see how much (and what kinds of) possible matches
    you can discard without losing much in compression.
    Thus imho LZP/ROLZ are just methods of match set filtering
    and I think it'd be better to find out what's optimal by
    yourself (probably something else).

    4. Btw, a data structure usable for a good match finder
    would usually also allow to build a PPM based on it.
    Wonder if you had a look at
    http://ctxmodel.net/files/PPMd/ppmd_Jr1_sh8.rar

  4. #4
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile

    Thank you, Shelwien !

    I will keep attention and try to understand the relationship between LZ and CM

    In fact I like LZ more than CM

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    As I mentioned, with LZ (LZ77,LZ78,ROLZ, but not LZP, if properly written)
    you actually produce redundant files by design.
    That's because there're usually many literal/match sequences decodable
    into the same data.

    Also, even considering speed, CM only can't compete with simplest LZ77
    implementations, especially huffman-based.
    Even against LZMA decoding, its probably still possible to win in
    decoding speed with CM, if we'd use a really optimized coding scheme
    (pipelined and multiplication-free) - and of course with much better
    compression.

    So I think that pure LZ only has real applications in very redundant cases -
    in the form of something like http://haskell.org/bz/rep.zip

  6. #6
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    ...Also "encode" is busy with job hunting or something, and
    toffer and osmanturan are not into LZ either.
    Which doesn't leave you much choice, as all others barely
    talk about anything technical at all.
    I've just came back from family home It's interesting that meanwhile I was playing with LZ codec for BIT for LZ-maniacs (such as my homemate which prefer speed rather than compression ratio). Let me share my "revisited" experiences since BIT 0.1 (it was ROLZ compressor):

    1-LZP is only good for simplicity. You are highly bounded on compression ratio. Since you damage "good" correlations almost randomly. So, IMO, LZP should be used alone as a simple codec or as a preprocessor among with a really slow compression (though it's not a good idea and PAQ9 proves that).

    2-ROLZ is good for eliminating some LZ77 troublesome such as effective "distance" coding etc. If you use only order-2 n-grams as ROLZ match context, you will get pretty fast and easy to implement LZ class compressor (QUAD, BIT 0.1, BALZ are all use order-2). But, if you aim higher compression, you need to implement order-1 match context (RZM, WinRK ROLZ2/ROLZ3). And this invites some problems. Because, you need to increase ROLZ search depth per context (for avoiding far offsets are discarded too fast). And this means another two problems:
    a) You need to find a good match from lots of candidates (=slower match finder).
    b) You need to implement some tricks for effectively encoding larger "offset" ranges.
    As a reference, in order-2, usually 64 or 128 offsets are sufficient per context. But, in order-1, you need more than 1024 offsets per context.

    3-Pure LZ77 is a bit hard to implement. Because, you need a proper match finder, effective distance coding scheme etc. But, it's decompression is a lot faster.

    If you aim higher compression in both ROLZ and LZ77, you need to implement some "parsing" strategies. You don't need "superb" parsing method, but everything is better than greedy parsing (finding longest matches at single point). Also, note that, in adaptive coding, there is no really "optimal parsing" in most compressors. They are all "near-optimal parsing". If you need really optimal parsing, then you can try brute-force parsing which is very slow That's why KZIP outperforms 7-zip on ZIP files.

    On my second "revisiting" on LZ class codec, I've tried to implement a very cache friendly ROLZ codec because I've a really poor knowledge on "generic programming" (such as binary trees, suffix arrays etc). But, in the end, I've found it's really hard to make it cache friendly. Because, you need ROLZ offset table in both compression and decompression. And that means you make it slower decompression. Also, using order-2 is a really bad choice if compression ratio is considered. Because, I've notice it's redundant enough. So, I began to think how to implement "an easy hack". Then I thought "Why don't I use ROLZ match finder in pure LZ77?" (In terminology, this is actually "hash table"). Now, I've switched to pure LZ class codec. Current draft implementation is "very very" weak because of ineffective "distance" coding (it compresses ENWIK8 down around 34 MiB). I know how to implement "bucket/bin" coding but I'm thinking on a parameterizable implementation (I use Genetic Algorithms to tune my codec's parameters for best compression). Anyway, I'm pretty happy with my own pure LZ77 codec which has sliding window up to 64 MiB
    BIT Archiver homepage: www.osmanturan.com

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    [OT]
    Why is everybody so much against PAQ9?
    It's a very good program. It isn't stronger than PAQ8, so what? It's much faster and it still visits the list of top choices regularly.
    [/OT]

  8. #8
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    [OT]
    Why is everybody so much against PAQ9?
    It's a very good program. It isn't stronger than PAQ8, so what? It's much faster and it still visits the list of top choices regularly.
    [/OT]
    It's the best LZP+CM AFAIK. So, it's fair to point it as maximum possible compression with LZP+CM idea. In all of my discussion, PAQ9 is not the point. The point is exactly LZP. And I'm trying to show someone: "You can reach at that level by using LZP". So, PAQ9 itself is "measurement" scale in here.
    BIT Archiver homepage: www.osmanturan.com

  9. #9
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Question

    Thank you Osmanturan~
    A detailed question:
    Should the already matched strings be inserted into the hash chain? In LZP, no hash update in matching loops. But in my LZ77 (my previous very bad implement experimental csc1,LZ77+ShannonFano coding), if I don't insert the matched strings, the compression ratio was up to 40% from 33%. I guess a already matched string won't be matched again from half position. But it's only a assumption.


    Best regards
    Last edited by Fu Siyuan; 5th May 2009 at 18:12.

  10. #10
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by ForTheKing View Post
    Thank you Osmanturan~
    A detailed question:
    Should the already matched strings be inserted into the hash chain? In LZP, no hash update in matching loops. But in my LZ77 (my previous very bad implement experimental csc1,LZ77+ShannonFano coding), if I don't insert the matched strings, the compression ratio was up to 40% from 33%. I guess a already matched string won't be matched again from half position. But it's only a assumption
    I highly recommend that please give up any assumption and look how data behaves. If you pay attention, you can match a substring in already matched string. But, in hash chains, hash tables or whatever match finder which has "aging factor" (discards older offsets) tend to bad at "inserting already matches". So, you can get rid of this effect by increasing search depth. Thus makes slower compression though.
    BIT Archiver homepage: www.osmanturan.com

  11. #11
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Red face

    A big or small questions

    How a LZ77 compressor to get a good compression on fp.log?
    My experimental compressor used some easy strategys like bitwise rangecoder ,different bits length to encode match_length and offsets, hash chain match finder with search depth of about 30-2000, records last 4 offsets,lazy evaluation----- satisfied ratio on some files. However, so poor on files like enwik8,especially on fp.log. I think LZ77 may hardly beat CM on fp.log. Even I made search depth to 2000, it hardly get under 980k and only 29.6% on enwik8 (normally 33% with depth of 30). Though it seems that talking about LZ is out of fashion, I still wonder how much other strategies there may be?

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Actually fp.log seems fairly LZ-compatible to me - there're lots of matches.
    So I guess its all a matter of modelling the matches and literals -
    for literals a low-order CM would be reasonable, and with matches the
    problem might be that the probability has to be attached to the specific
    string for fp.log, while offsets would be always different. One way for
    that is to use absolute offsets.

    Anyway, it might be a good idea to dump the symbol codelengths from some
    open source compressor (paq?), and your compressor (for literals its
    obvious, and for matches the codelength might be distributed among
    matched symbols) and draw something like http://ctxmodel.net/book1a.png
    to see where your model's performance is worst.
    Last edited by Shelwien; 3rd June 2009 at 09:18.

  13. #13
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by ForTheKing View Post
    A big or small questions

    How a LZ77 compressor to get a good compression on fp.log?
    My experimental compressor used some easy strategys like bitwise rangecoder ,different bits length to encode match_length and offsets, hash chain match finder with search depth of about 30-2000, records last 4 offsets,lazy evaluation----- satisfied ratio on some files. However, so poor on files like enwik8,especially on fp.log. I think LZ77 may hardly beat CM on fp.log. Even I made search depth to 2000, it hardly get under 980k and only 29.6% on enwik8 (normally 33% with depth of 30). Though it seems that talking about LZ is out of fashion, I still wonder how much other strategies there may be?
    If you get a "bad" results on highly redundant file (eg. FP.log), then someone should concern about match modeling. Seems, you have a imprecise match modeling. Just wonder how you encode your match length/distance pair?
    BIT Archiver homepage: www.osmanturan.com

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @osman:
    Just look at fp.log, its a special case.
    There're many matching substrings, but distances between each two
    subsequent string occurences are different.
    Thus relative offsets in match pairs also become different and
    are not so easy to model.

  15. #15
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    @osman:
    Just look at fp.log, its a special case.
    There're many matching substrings, but distances between each two
    subsequent string occurences are different.
    Thus relative offsets in match pairs also become different and
    are not so easy to model.
    Yes. You are right. But, I was talking about "deeper" modeling way. For example, I mean, did he use segmented match lengths and bucket coding for distances?
    BIT Archiver homepage: www.osmanturan.com

  16. #16
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by osmanturan View Post
    Yes. You are right. But, I was talking about "deeper" modeling way. For example, I mean, did he use segmented match lengths and bucket coding for distances?
    I don't exactly know what is so called bucket; may be I've already doing this:


    void putmatchpos(U32 len)
    {
    U32 i,j;


    if (len<64) // 000 + 6bit
    {
    ENCODEBIT(matchposL1P[1],0,MLSHIFT); // p 0xx
    ENCODEBIT(matchposL1P[2],0,MLSHIFT); // p 00x
    ENCODEBIT(matchposL1P[4],0,MLSHIFT); // p 000
    i=1;
    j=(len>>5)&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT) i+=i+j;
    j=(len>>4)&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT) i+=i+j;
    j=(len>>3)&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT) i+=i+j;
    j=(len>>2)&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT) i+=i+j;
    j=(len>>1)&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT) i+=i+j;
    j=len&1; ENCODEBIT(matchposL2P[0][i],j,MLSHIFT)

    }
    else if (len<(64+64*) // 001+ 6bit + 3directbit
    {
    len-=64;

    ENCODEBIT(matchposL1P[1],0,MLSHIFT); // p 0xx
    ENCODEBIT(matchposL1P[2],0,MLSHIFT); // p 00x
    ENCODEBIT(matchposL1P[4],1,MLSHIFT); // p 001
    i=1;
    j=(len>>&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT) i+=i+j;
    j=(len>>7)&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT) i+=i+j;
    j=(len>>6)&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT) i+=i+j;
    j=(len>>5)&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT) i+=i+j;
    j=(len>>4)&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT) i+=i+j;
    j=(len>>3)&1; ENCODEBIT(matchposL2P[1][i],j,MLSHIFT)
    ENCODEBITNOP(len&4) // 3directbit
    ENCODEBITNOP(len&2) // 3directbit
    ENCODEBITNOP(len&1) // 3directbit
    }
    else if (len<(64+64*8+64*64)) // 010+ 6bit + 6directbit
    {
    len-=64+64*8;

    ENCODEBIT(matchposL1P[1],0,MLSHIFT); // p 0xx
    ENCODEBIT(matchposL1P[2],1,MLSHIFT); // p 01x
    ENCODEBIT(matchposL1P[5],0,MLSHIFT); // p 010
    i=1;
    j=(len>>11)&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT) i+=i+j;
    j=(len>>10)&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT) i+=i+j;
    j=(len>>9)&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT) i+=i+j;
    j=(len>>&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT) i+=i+j;
    j=(len>>7)&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT) i+=i+j;
    j=(len>>6)&1; ENCODEBIT(matchposL2P[2][i],j,MLSHIFT)
    ENCODEBITNOP(len&32) // 6directbit
    ENCODEBITNOP(len&16) // 6directbit
    ENCODEBITNOP(len& // 6directbit
    ENCODEBITNOP(len&4) // 6directbit
    ENCODEBITNOP(len&2) // 6directbit
    ENCODEBITNOP(len&1) // 6directbit
    }
    else if (len<(64+64*8+64*64+64*64*) // 011+ 6bit + 9directbit
    {
    len-=64+64*8+64*64;

    ENCODEBIT(matchposL1P[1],0,MLSHIFT); // p 0xx
    ENCODEBIT(matchposL1P[2],1,MLSHIFT); // p 01x
    ENCODEBIT(matchposL1P[5],1,MLSHIFT); // p 011
    i=1;
    j=(len>>14)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT) i+=i+j;
    j=(len>>13)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT) i+=i+j;
    j=(len>>12)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT) i+=i+j;
    j=(len>>11)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT) i+=i+j;
    j=(len>>10)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT) i+=i+j;
    j=(len>>9)&1; ENCODEBIT(matchposL2P[3][i],j,MLSHIFT)
    ENCODEBITNOP(len&256) ENCODEBITNOP(len&12 // 6directbit
    ENCODEBITNOP(len&64) ENCODEBITNOP(len&32) // 6directbit
    ENCODEBITNOP(len&16) ENCODEBITNOP(len& // 6directbit
    ENCODEBITNOP(len&4) ENCODEBITNOP(len&2) // 6directbit
    ENCODEBITNOP(len&1) // 6directbit
    }
    else if (len<(64+64*8+64*64+64*64*64)) // 100+ 6bit + 12directbit
    {
    len-=64+64*8+64*64+64*64*8;

    ENCODEBIT(matchposL1P[1],1,MLSHIFT); // p 1xx
    ENCODEBIT(matchposL1P[3],0,MLSHIFT); // p 10x
    ENCODEBIT(matchposL1P[6],0,MLSHIFT); // p 100
    i=1;
    j=(len>>17)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT) i+=i+j;
    j=(len>>16)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT) i+=i+j;
    j=(len>>15)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT) i+=i+j;
    j=(len>>14)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT) i+=i+j;
    j=(len>>13)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT) i+=i+j;
    j=(len>>12)&1; ENCODEBIT(matchposL2P[4][i],j,MLSHIFT)
    ENCODEBITNOP(len&204 ENCODEBITNOP(len&1024) ENCODEBITNOP(len&512) //12directbit
    ENCODEBITNOP(len&256) ENCODEBITNOP(len&12 // 12directbit
    ENCODEBITNOP(len&64) ENCODEBITNOP(len&32) // 12directbit
    ENCODEBITNOP(len&16) ENCODEBITNOP(len& // 12directbit
    ENCODEBITNOP(len&4) ENCODEBITNOP(len&2) // 12directbit
    ENCODEBITNOP(len&1) // 12directbit
    }
    else if (len<(64+64*8+64*64+64*64*64+64*64*64*) // 101+ 6bit + 15directbit
    {
    len-=64+64*8+64*64+64*64*64;

    ENCODEBIT(matchposL1P[1],1,MLSHIFT); // p 1xx
    ENCODEBIT(matchposL1P[3],0,MLSHIFT); // p 10x
    ENCODEBIT(matchposL1P[6],1,MLSHIFT); // p 101
    i=1;
    j=(len>>20)&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT) i+=i+j;
    j=(len>>19)&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT) i+=i+j;
    j=(len>>1&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT) i+=i+j;
    j=(len>>17)&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT) i+=i+j;
    j=(len>>16)&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT) i+=i+j;
    j=(len>>15)&1; ENCODEBIT(matchposL2P[5][i],j,MLSHIFT)
    ENCODEBITNOP(len&16384) ENCODEBITNOP(len&8192) ENCODEBITNOP(len&4096) //15directbit
    ENCODEBITNOP(len&204 ENCODEBITNOP(len&1024) ENCODEBITNOP(len&512) //15directbit
    ENCODEBITNOP(len&256) ENCODEBITNOP(len&12 // 15directbit
    ENCODEBITNOP(len&64) ENCODEBITNOP(len&32) // 15directbit
    ENCODEBITNOP(len&16) ENCODEBITNOP(len& // 15directbit
    ENCODEBITNOP(len&4) ENCODEBITNOP(len&2) // 15directbit
    ENCODEBITNOP(len&1) // 15directbit
    }
    else if (len<(64+64*8+64*64+64*64*64+64*64*64*8+64*64*64*6 4)) // 110+ 6bit + 18directbit
    {
    len-=64+64*8+64*64+64*64*64+64*64*64*8;

    ENCODEBIT(matchposL1P[1],1,MLSHIFT); // p 1xx
    ENCODEBIT(matchposL1P[3],1,MLSHIFT); // p 11x
    ENCODEBIT(matchposL1P[7],0,MLSHIFT); // p 110
    i=1;
    j=(len>>23)&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT) i+=i+j;
    j=(len>>22)&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT) i+=i+j;
    j=(len>>21)&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT) i+=i+j;
    j=(len>>20)&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT) i+=i+j;
    j=(len>>19)&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT) i+=i+j;
    j=(len>>1&1; ENCODEBIT(matchposL2P[6][i],j,MLSHIFT)
    ENCODEBITNOP(len&131072) ENCODEBITNOP(len&65536) ENCODEBITNOP(len&3276 //18directbit
    ENCODEBITNOP(len&16384) ENCODEBITNOP(len&8192) ENCODEBITNOP(len&4096) //18directbit
    ENCODEBITNOP(len&204 ENCODEBITNOP(len&1024) ENCODEBITNOP(len&512) //18directbit
    ENCODEBITNOP(len&256) ENCODEBITNOP(len&12 // 18directbit
    ENCODEBITNOP(len&64) ENCODEBITNOP(len&32) // 18directbit
    ENCODEBITNOP(len&16) ENCODEBITNOP(len& // 18directbit
    ENCODEBITNOP(len&4) ENCODEBITNOP(len&2) // 18directbit
    ENCODEBITNOP(len&1) // 18directbit
    }
    else // 111+ 6bit + 21directbit
    {
    len-=64+64*8+64*64+64*64*64+64*64*64*8+64*64*64*64;

    ENCODEBIT(matchposL1P[1],1,MLSHIFT); // p 1xx
    ENCODEBIT(matchposL1P[3],1,MLSHIFT); // p 11x
    ENCODEBIT(matchposL1P[7],1,MLSHIFT); // p 111
    i=1;
    j=(len>>26)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT) i+=i+j;
    j=(len>>25)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT) i+=i+j;
    j=(len>>24)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT) i+=i+j;
    j=(len>>23)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT) i+=i+j;
    j=(len>>22)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT) i+=i+j;
    j=(len>>21)&1; ENCODEBIT(matchposL2P[7][i],j,MLSHIFT)
    ENCODEBITNOP(len&1048576) ENCODEBITNOP(len&52428 ENCODEBITNOP(len&262144) //21directbit
    ENCODEBITNOP(len&131072) ENCODEBITNOP(len&65536) ENCODEBITNOP(len&3276 //21directbit
    ENCODEBITNOP(len&16384) ENCODEBITNOP(len&8192) ENCODEBITNOP(len&4096) //21directbit
    ENCODEBITNOP(len&204 ENCODEBITNOP(len&1024) ENCODEBITNOP(len&512) //21directbit
    ENCODEBITNOP(len&256) ENCODEBITNOP(len&12 // 21directbit
    ENCODEBITNOP(len&64) ENCODEBITNOP(len&32) // 21directbit
    ENCODEBITNOP(len&16) ENCODEBITNOP(len& // 21directbit
    ENCODEBITNOP(len&4) ENCODEBITNOP(len&2) // 21directbit
    ENCODEBITNOP(len&1) // 21directbit
    }

    }

    ENCODEBIT is macro


    very verbose....

  17. #17
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Yes. Seems you already used bucket coding. You can eliminate branches in a different way. Think as you have a function which looks like a "stair". And each upper step is more wider. If you can't understand what i'm talking about, just look at deflate specification on RFC. I use similar scheme. But, I don't use lookup tables. Thus makes it faster.
    BIT Archiver homepage: www.osmanturan.com

  18. #18
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    I skimed the LZMA and found it use a very very precise price evaluation. I guess it's a big difference between such profressional algorithms and my diy program that lead to the not too much extra compression on test files. I checked the performance of Packet0.9 by Nania Francesco. The performance is very similar with mine. So I guess it also used a rough price evalution. I will improve my program to check my assertion speculation.

    Question 2: Could someone make some synopsis about the model in LZMA(state?). Is it simply shift a bit of the state while encoded a bit?

  19. #19
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by ForTheKing View Post
    I skimed the LZMA and found it use a very very precise price evaluation. I guess it's a big difference between such profressional algorithms and my diy program that lead to the not too much extra compression on test files. I checked the performance of Packet0.9 by Nania Francesco. The performance is very similar with mine. So I guess it also used a rough price evalution. I will improve my program to check my assertion speculation.

    Question 2: Could someone make some synopsis about the model in LZMA(state?). Is it simply shift a bit of the state while encoded a bit?
    LZMA uses so-called optimal parsing. Actually it's "near" optimal parsing because LZMA "assumes" something to make both coding faster and simple.

    AFAIK, only LZX, Quantum (?), LZMA, ROLZ2/ROLZ3 (WinRK) and RZM use near-optimal parsing. The others use different coding and parsing schemes. As a result, all of algorithms have different strength. I think, that's the reason of why there is a little difference between Packet 0.9 and yours.

    As to "Question 2" (I could not found first question though ), I'm not sure that I understood correctly. If I understood correctly, seems you are talking about it's bit model, right? If yes, it's a rewritten version of linear counter. But, it's imprecise IMO. I advice to use linear counter instead.
    BIT Archiver homepage: www.osmanturan.com

  20. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    afaik, quantum used lzss-optimal parsing, i.e. one optimized to number of matches rather than number of bits

  21. #21
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    1. In your opinion, better parsing and better match finder which will find longer matches, which one is more important on effecency? well I'm aiming at fast speed@good performance,however not extremely fast or perfect performance. I found hash chain MF achieves little progress though cost much more time. So I'm not sure if I should pay more attention on parsing --- fp.log is a interesting test file I think.

    2. In my under developing compressor, I initially set the min_match to 4. However, now I doubt if it will lead to a latency lose in performance. So what's your opinion? Several days ago you advice me use hash2-3-4 MF. I didn't understand that time why hash2-3 is needed since there is a hash-4.So now I come to think if hash2-3 are used to find short match in short distance.But I'm not sure.

    3. If a good parsing achieved, is a more than order-0 coder needed? Now I'm using order-1 RC. Order-0 RC obviously is faster than Order-1 expecially on random data based on my observation. What's more, ACE uses Huffman , but has the same performance to WinRAR. (Is RAR using a simple PPM coder? I'm not sure). So these are my founds, I'm not sure what to do next.

    Thanks

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > If a good parsing achieved, is a more than order-0 coder needed?

    There's no sense to talk about match coding in terms of "orders",
    but some context might still be useful (like a few previous literal/match flags etc).
    But for literals it should be helpful (previous symbols, not matches),
    and a bitwise order1 model shouldn't be really much faster than order0...
    and there're things like fpaq0f.

    > What's more, ACE uses Huffman, but has the same performance to WinRAR.
    > (Is RAR using a simple PPM coder? I'm not sure).

    Rar's LZ is LZH. It just has two different compression algorithms - LZH and PPMd (vH).

    > I'm not sure what to do next.

    If I'd want a new LZ, I'd just clean the LZMA coder, and then improve what I can there.

    Or another possibility might be decompiling RZM.

    In such algorithms there're lots of small tricks, and it takes really a lot of time
    to find them, so why not start from something which already works?

  23. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Fu Siyuan View Post
    1. In your opinion, better parsing and better match finder which will find longer matches, which one is more important on effecency? well I'm aiming at fast speed@good performance,however not extremely fast or perfect performance. I found hash chain MF achieves little progress though cost much more time. So I'm not sure if I should pay more attention on parsing --- fp.log is a interesting test file I think
    fp.log is rather unusual example, it may be beter to just preprocess such files by things like rep/lzp

    your question is rather useless. both things important. the only answer i can give is that for processing of large files you need larger dictionaries and better matchfinder, while for processing of small files more important part is parser


    Several days ago you advice me use hash2-3-4 MF. I didn't understand that time why hash2-3 is needed since there is a hash-4.So now I come to think if hash2-3 are used to find short match in short distance.
    yes, exactly


    ACE uses Huffman , but has the same performance to WinRAR. (Is RAR using a simple PPM coder? I'm not sure).
    ace and rar are the same generation programs and uses very close compression formats (deflate with some improvements) - before they have started to add all these filters plus PPMd in RAR

  24. #24
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fu Siyuan View Post
    1. In your opinion, better parsing and better match finder which will find longer matches, which one is more important on effecency? well I'm aiming at fast speed@good performance,however not extremely fast or perfect performance. I found hash chain MF achieves little progress though cost much more time. So I'm not sure if I should pay more attention on parsing --- fp.log is a interesting test file I think.
    Many people claim that hash chains are not useful with big dictionary (though I didn't test it myself). I don't use it anyway. Why? Because, it has to memory access in single query while hash table needs only one memory access. I prefer hash tables for speed and binary tree/suffix tree etc for efficiency. Parsing is another fact of LZ codecs. I think, you should pay parsing a lot. Because, LZ codecs tend to leave a huge amount of redundancy. In ROLZ, you won't have too much concern. Because, it encodes "distances" in a MTF queue and therefore this makes distance coding more efficient than pure LZ (of course, ROLZ has got it's own disadvantages). I should note that, simple heuristics like Lazy Parsing are not near-optimal parsing. If you aim maximum compression, you should take care code length of each symbol (= -SUM_i [log2(P_i)] ).

    Quote Originally Posted by Fu Siyuan View Post
    2. In my under developing compressor, I initially set the min_match to 4. However, now I doubt if it will lead to a latency lose in performance. So what's your opinion?
    Leaving minimum match larger than simply means faster and worser compression. Because, EXE like files usually have shorter matches. If you want to deal EXE like file, you have 2 choices:
    1-Set minimum match length to 2 back
    2-Implement a stronger literal coder which also models up to order-4

    Quote Originally Posted by Fu Siyuan View Post
    Several days ago you advice me use hash2-3-4 MF. I didn't understand that time why hash2-3 is needed since there is a hash-4.So now I come to think if hash2-3 are used to find short match in short distance.But I'm not sure.
    Seems you have forgot my post which is about the reason why multi way hashing is important: http://encode.ru/forum/showpost.php?p=8316&postcount=92

    Quote Originally Posted by Fu Siyuan View Post
    3. If a good parsing achieved, is a more than order-0 coder needed? Now I'm using order-1 RC. Order-0 RC obviously is faster than Order-1 expecially on random data based on my observation.
    Good parsing actually relies on modeling itself. So, they highly affect each other. So, I think you should leave thinking modeling as only n-grams models (order-n). IMO, in LZ codec you have 2 choices for literal coding:
    1-Implement a strong n-grams model which includes order-0, order-1 and order-2.
    2-Implement a single masked context model (or simply order-1 can be enough if you are lazy to apply automated optimization) which is also affected by some small context like "What did you do before?" context.

    Quote Originally Posted by Fu Siyuan View Post
    What's more, ACE uses Huffman , but has the same performance to WinRAR. (Is RAR using a simple PPM coder? I'm not sure). So these are my founds, I'm not sure what to do next.
    I don't know ACE's interiors. But, I know RAR's. RAR uses block-wise LZH or PPM after a decision (=data analyzing). Also, RAR uses filters and it's own VM.
    BIT Archiver homepage: www.osmanturan.com

  25. #25
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile

    Well, Thanks for everyone's reply! I don't reply rapidly because I'm thinking over about your advices. In fact, everytime I would spend some time to understand about your advices.

    osmanturan:Seems you have forgot my post which is about the reason why multi way hashing is important: http://encode.ru/forum/showpost.php?p=8316&postcount=92

    Actually, I havn't. I just always thought min_match=4 (my program was) so I doubt why hash2-hash3 are necessary ,at that time.

    Today, I made some modification on my program. I set min_match=2 with hash2-hash3 added. However, the result is very bad. There are only several hundreds of 2-length matches per MB. And the performance even got worse (I'm using precise price evaluation). Then I eliminated the 2-length match , setting min_match=3 with hash-3. I got a LITTLE (maybe less than 1%) improve on most binary file but some times a little little worse on other kinds of data (most of my own testset are TARs with mixed files).

    I didn't use HashTable because I have no knowledge of HT before I read BALZ and your posts several times. I think HashTable may cost so match memory with long search cycle. Hash Chain always costs 4*(WND_SIZE+HashHead) MB.
    However,recently I found long-cycle makes such MF lose sense--- it would lose is speed. I considered to study and try HT MF some times. However, I have so much want to do. I want to try to write my own archiver, data analyzer, filter (I'm considering to improve the simple RGBdelta to RAR-RGB like ). The previous work binary-tree and optimal parsing are still stranded.
    I just don't know which to focus on.

    Now my program have the similar compression ratio and speed compared to 7z-fast mode. However, on some binary files, my program gets nearly 6% worse. The parsing now is very very 7z - optimumfast like. I'm confusing how the so much differences comes out. Anyway, it's much better than my initial csc3.

    one more question: Random Data always problem, only 1M/s on such kind of data. I said I've wrote a detector by count each symbols freq, to distinguish if such block data is bad data. But do the all detectors or filters scan the whole data for one or two extra time? Is there a faster way?


    Or I think maybe I should upload my the newest program and whole sources to better express what I've done.
    Last edited by Fu Siyuan; 3rd August 2009 at 17:52.

  26. #26
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fu Siyuan View Post
    Well, Thanks for everyone's reply! I don't reply rapidly because I'm thinking over about your advices. In fact, everytime I would spend some time to understand about your advices.
    Then you should ask what was unclear

    Quote Originally Posted by Fu Siyuan View Post
    Actually, I havn't. I just always thought min_match=4 (my program was) so I doubt why hash2-hash3 are necessary ,at that time.
    Hash2/3 can be good again in such circumstances. Because, hash 2/3 finder can be a hash table which holds a single phrase at a time. So, this means in shortly you will have most recent match. In the another word, you have shortest distance at most time.

    Quote Originally Posted by Fu Siyuan View Post
    Today, I made some modification on my program. I set min_match=2 with hash2-hash3 added. However, the result is very bad. There are only several hundreds of 2-length matches per MB. And the performance even got worse (I'm using precise price evaluation). Then I eliminated the 2-length match , setting min_match=3 with hash-3. I got a LITTLE (maybe less than 1%) improve on most binary file but some times a little little worse on other kinds of data (most of my own testset are TARs with mixed files).
    On binary data (X86, images, audio etc), you need short matches. But, if you got worser compression on text, that means in shortly you have a improper parsing strategy. You should focus on it to eliminate this effect.

    Quote Originally Posted by Fu Siyuan View Post
    I didn't use HashTable because I have no knowledge of HT before I read BALZ and your posts several times. I think HashTable may cost so match memory with long search cycle. Hash Chain always costs 4*(WND_SIZE+HashHead) MB.
    Actually hash tables are more both memory and cache friendly. Because, you need only one memory access at a single time. Also, you can limit your each hash table entry to cache line size (=64 bytes) to become more cache friendly. As to memory consumptation, you can limit hash table size very easily. IMO, hash tables are more easier to implement. However, if you need a good compression ratio you need longer search depth as usual.
    Quote Originally Posted by Fu Siyuan View Post
    However,recently I found long-cycle makes such MF lose sense--- it would lose is speed.
    Well...It's obvious already, I think
    Quote Originally Posted by Fu Siyuan View Post
    I considered to study and try HT MF some times.
    Note that HT MF won't help on compression ratio. It's actually a good balance between speed/ratio IMO. If you aim higher compression, you need more higher search depth on HT or a more precise match finder.
    Quote Originally Posted by Fu Siyuan View Post
    However, I have so much want to do. I want to try to write my own archiver, data analyzer, filter (I'm considering to improve the simple RGBdelta to RAR-RGB like ). The previous work binary-tree and optimal parsing are still stranded.
    I just don't know which to focus on.
    All of thing are related to each other. I mean you can't decide which part should be finished first. You should make route and implement that plan.

    Quote Originally Posted by Fu Siyuan View Post
    Now my program have the similar compression ratio and speed compared to 7z-fast mode. However, on some binary files, my program gets nearly 6% worse. The parsing now is very very 7z - optimumfast like. I'm confusing how the so much differences comes out. Anyway, it's much better than my initial csc3.
    Seems you use pure order-1 context for literal coding. It's good for text, but not for binary data in most time. You should discard low bits of previous symbol. Something like:
    Code:
    // instead of
    literalContext = previousSymbol;
    // use below code
    literalContext = (previousSymbol & 0xF0)>>4; // discard 4 low bits
    Quote Originally Posted by Fu Siyuan View Post
    one more question: Random Data always problem, only 1M/s on such kind of data. I said I've wrote a detector by count each symbols freq, to distinguish if such block data is bad data. But do the all detectors or filters scan the whole data for one or two extra time? Is there a faster way?
    Ideal compression should have "online" property. That means you sequentially apply all required thing in single pass. Of course, there is no any strict law about that You can use several pass or multiple model in parallel at single pass (of course, compression itself is another pass in this case). I think, main benefit of having "online" property is streamable compression/decompression which is ideal for networking.
    BIT Archiver homepage: www.osmanturan.com

  27. #27
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    osman:
    Seems you use pure order-1 context for literal coding. It's good for text, but not for binary data in most time. You should discard low bits of previous symbol. Something like:

    // instead of
    literalContext = previousSymbol;
    // use below code
    literalContext = (previousSymbol & 0xF0)>>4; // discard 4 low bits
    Yeah! I tried it. Significant improvement on binary files. but got worse on text (inevitable since discard last 4 bit?) Is it because pure order-1 has too slow adaptation on binary files? I may have a try on limited-markov tomorrow. Seems I'm completely influenced by LZMA.

  28. #28
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Actually LZMA uses same trick too It uses 3 high bits of previous symbol as context by default and can be controlled with a switch (I don't remember know. You can have a look at 7-zip help). Why don't you detect this circumstances? I mean you can use two order-1 models:
    1-Pure order-1 model
    2-Masked order-1 model
    Then compute total code length over these model and compare which one is smaller. Then you can use whatever is good.

    Actually you can mask out specific bits. For this you need a good context mask. And I don't think you can find a good context mask without running an automated optimization or brute-force.

    BTW, could you try this mask on pure text (with Latin alphabet not Chinese of course )?

    MASK = 0xDF;

    Edit:
    Quote Originally Posted by Fu Siyuan View Post
    ...Is it because pure order-1 has too slow adaptation on binary files? I may have a try on limited-markov tomorrow.
    Actually the reason is having noisy bits in low bits. This is almost true all binary files.
    Last edited by osmanturan; 3rd August 2009 at 20:08.
    BIT Archiver homepage: www.osmanturan.com

  29. #29
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Then compute total code length over these model and compare which one is smaller. Then you can use whatever is good.
    Yes. That must be help. But I prefer a simple coder , I paid more attention on LZ than Model&Coder in order to get a extremely fast decompression speed. I guess UHARC may have a more complicated statistic model because it spend more than 100s on ENWIK9 decompression. Some may have their own ideas that it is much better than symmetric algo. But I think a speed faster than disk-IO will makes people to lose the conception of "decompression". So I treat decompression time very important.

    I also have a question about WinRAR/ACE/LZX which use Huffman. Huffman doesn't have context ,right? However why it gets similar or sometimes better than LZARI? Is it because the very sophistic LZ layer? Does it mean if I have a extremely strong LZ coder, I don't have to use context?

  30. #30
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fu Siyuan View Post
    Yes. That must be help. But I prefer a simple coder , I paid more attention on LZ than Model&Coder in order to get a extremely fast decompression speed. I guess UHARC may have a more complicated statistic model because it spend more than 100s on ENWIK9 decompression. Some may have their own ideas that it is much better than symmetric algo. But I think a speed faster than disk-IO will makes people to lose the conception of "decompression". So I treat decompression time very important.
    AFAIK, there is no arithmetic coding based LZ compressor which claims that it's faster than disk I/O. You may want to stick on huffman or any other lightweight thing.

    Quote Originally Posted by Fu Siyuan View Post
    I also have a question about WinRAR/ACE/LZX which use Huffman. Huffman doesn't have context ,right? However why it gets similar or sometimes better than LZARI? Is it because the very sophistic LZ layer? Does it mean if I have a extremely strong LZ coder, I don't have to use context?
    Well...RAR (in LZH) and LZX uses huffman coding. But, not sure about ACE. And I think, any statistical source coding can have context. I think, different relies on tricks. You should disable all filters when testing RAR. Also, keep in mind LZX has a near-optimal parsing with cost approximation. I mean, arithmetic coding is not a superior thing for compression. If you don't have a good set of probabilities, there is no matter of choosing which source coder. In the another word, more important part is accurate probability producing (=modeling).
    BIT Archiver homepage: www.osmanturan.com

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •