Results 1 to 11 of 11

Thread: optimal parsing

  1. #1
    Member
    Join Date
    Jul 2008
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    optimal parsing

    Hello everyone . Just thought I'd say this forum seems to be full of neat compressors, and I feel awkward asking such a basic question...

    I've got a fully functioning lz compressor with greedy parsing. I'm trying to upgrade it to optimal parsing, which has led me to lazy parsing. All the explanations of lazy parsing seem to explain it as this:

    1. start from the beginning of the file
    2. find the longest match at the current position
    3. If the match overlaps with a longer match, ignore it and proceed to the next position
    4. do this until you reach the end of the file

    But this seems far from optimal to me. If there's chains of matches only overlapping by one byte, a lot of matches will discarded. The next idea would be to just shorten the shorter of the 2 overlapping matches, but this also is sub-optimal for certain chains of matches.

    Is this the correct way of interpreting lazy parsing? It seems too simple to get it confused. Maybe I was mistaken thinking it was optimal?

    Anyways, thanks for all the other insightful posts you guys have made

  2. #2
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Isn?t the word "lazy" saying all? - Optimal parsing is optimal parsing :-D but I cannot say the characteristics of them to you. I would be interested too.

  3. #3
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Me is interested too.

    A wiki would be nice for such things!

    TT

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. "Lazy" usually means "delayed"

    2. "Optimal" means that you're trying to find the most compressible
    sequence of matches/literals, while its possible to encode each
    symbol as literal or as a part of several matches (not only length, but
    offset can be different too)

    3. Google is your friend: http://www.cbloom.com/algs/dictionary.html

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Well, the so called ?Lazy Matching? or ?Lazy Evaluation? is a heuristic for more optimal LZ-coding. Obviously, if at each step we will grab the longest match, we may loose even longer match within the reach of a current match. Such thing called ?Greedy Parsing?. With the original ?Lazy Matching?, we delay the coding by one-step to see, is even longer match followed by the current match? If so, we drop a literal and this new even longer match. In addition to that, we may do a ?Lazy Matching? with a few bytes lookahead. This means that we delay the coding decision for a few, not just one step/byte.
    Examples:
    Code:
      Greedy Parsing:
      Len := GetMatchLen(CurPos);
      // ..
      if (Len >= MINMATCH)
        Encode(Len);
      // ..
       
      Lazy Matching:
      Len = GetMatchLen(CurPos);
      if (Len >= MINMATCH) {
        NextLen = GetMatchLen(CurPos + 1);
        if (NextLen > Len)
          Len = 0; // Drop current match
      }
      // ..
      if (Len >= MINMATCH)
        Encode(Len);
      // ..
       
      Lazy Matching with a few-byte lookahead:
      Len = GetMatchLen(CurPos);
      if (Len >= MINMATCH) {
        for (i = 1; i <= LOOKAHEAD; i++) {
          NextLen = GetMatchLen(CurPos + i);
          if (NextLen >= (Len + i)) {
            Len = 0; // Drop current match
            break;
          }
        }
      }
      if (Len >= MINMATCH)
        Encode(Len);
      // ..
    The ?Optimal Parsing? is more complicated. We getting ALL matches at EACH position. After, we try to build so called ?Optimal Path? or optimal sequence of literals and matches. Here are many variants on how we examine the cost or price of each choice, how we build/optimize path, etc.



  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    lazy evaluation is other thing

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    lazy evaluation is other thing
    AFAIK, it's a synonym...

  8. #8
    Member
    Join Date
    Jul 2008
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Simon Berger wrote:
    Isn?t the word "lazy" saying all? - Optimal parsing is optimal parsing :-D but I cannot say the characteristics of them to you. I would be interested too.
    Whenever I looked up optimal parsing I ran into lazy parsing, so I began to think they meant the same thing

    Shelwien wrote:
    3. Google is your friend: http://www.cbloom.com/algs/dictionary.html
    Thanks! I came across his name a lot, but skipped over it because I thought I'd already read it. This has helped clear things up.

    encode wrote:
    The ‘Optimal Parsing’ is more complicated. We getting ALL matches at EACH position. After, we try to build so called ‘Optimal Path’ or optimal sequence of literals and matches. Here are many variants on how we examine the cost or price of each choice, how we build/optimize path, etc.
    My first attempt at optimal parsing was to find the longest possible matches and try each one of them at different lengths, but as you can probably guess that would take years to fully complete . Looks like lazy parsing is just optimal parsing without the brute force approach, hence "lazy".

  9. #9
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    48
    Thanks
    1
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Bulat Ziganshin View Post
    lazy evaluation is other thing
    Quote Originally Posted by encode View Post
    AFAIK, it's a synonym...
    Not for a Haskell programmer

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Ilia, use google

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Read the Deflate specs.:
    Quote Originally Posted by algorithm.txt
    deflate() also defers the selection of matches with a lazy evaluation
    mechanism. After a match of length N has been found, deflate() searches for
    a longer match at the next input byte. If a longer match is found, the
    previous match is truncated to a length of one (thus producing a single
    literal byte) and the process of lazy evaluation begins again. Otherwise,
    the original match is kept, and the next match search is attempted only N
    steps later.

Similar Threads

  1. parsing methods
    By Piotr Tarsa in forum Forum Archive
    Replies: 18
    Last Post: 9th August 2007, 06:45

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •