Results 1 to 5 of 5

Thread: lzp question

  1. #1
    Member
    Join Date
    Nov 2011
    Location
    Iran
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    lzp question

    hi everyone
    I'm new in this area and have been reading some paper but i have some problem that i hope you solve it.
    i 'm reading lzp paper (by Charles Bloom) ...in implementation section of lzp1 and lzp2 it says
    H=((C>>11)^C)&0xFFF
    to calculate hash.
    1.using 3 bytes as context (fixed context for lzp1 as paper said) we have pow(2,24) states but H has only pow(2,12) states
    so we have collision here. is'nt it a problem to have collision?
    2.do we achieve better compression ratio if just define a 16MB( pow(2,24) ) linear array(assume that there is
    no problem in allocating 16 MB of memory.)?
    3.where did this H formula come from? why shift right by 11 and then xor by C (why not shift by 12 for example)?
    thx in advance

    (BTW today is my birthday )

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Happy Birthday!

    People are hashing things with intention to handle collision later or they doesn't care about collisions. Hashing allows to map sparse key space into a denser space, so it's useful tool for gathering high order statistics.

    The hash function you posted looks very bad, collision-wise, but it's probably heavily optimized for speed.

    Good hashes achieve avalanche effect - changing one bit in input changes many bits in output (on average of course). Hash posted above doesn't have that property.

    Sanmayce posted some time ago his own hash benchmark suite, you can explore it: http://encode.ru/threads/1160-Fastes...-hash-function

  3. #3
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Happy Birthday!

  4. #4
    Member
    Join Date
    Nov 2011
    Location
    Iran
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thx piotr for your reply
    Hash collision are not avoided.this was found to hurt compression by less than 0.05bpb,and greatly improved speed and memory usage.....from original LZP paper by Charles Bloom
    so it reduces the compression ratio but improve speed and memory usage.but the question is how he calculate that 0.05 and isnt that symbol ranking algorithm?

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I don't know which variation of LZP it is. There's at least one thing than can go different: match lengths. For example in my TarsaLZP matches have length at most 1, while usually match lengths are unbounded.

    How he computed that 0.05 bpb difference? Probably he tested it against a data structure that either doesn't use hashing (or any other technique that could lead to collision) or resolve collisions within his hash table.

Similar Threads

  1. Question about BWTS
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 1st May 2011, 22:01
  2. LZC Question
    By moisesmcardona in forum Data Compression
    Replies: 3
    Last Post: 16th August 2009, 22:33
  3. LZP flag sequence compression
    By Shelwien in forum Data Compression
    Replies: 8
    Last Post: 9th August 2009, 02:08
  4. flzp, new LZP compressor/preprocessor
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 23rd June 2008, 17:24
  5. FLASHZIP new ARI+LZP compressor
    By Nania Francesco in forum Forum Archive
    Replies: 65
    Last Post: 5th February 2008, 22:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •