Results 1 to 18 of 18

Thread: question to a LZSS-like compressor

  1. #1
    Member
    Join Date
    Jan 2013
    Location
    on the moon
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    question to a LZSS-like compressor

    Hello experts,

    I have a decompressor on hand which performs like a LZ77 or LZSS sliding window decompressor. I have used to google to find the inventor or the appropriate compressor for it but with little success so far.

    Here is the algorithm for the decompressor (the main loop):

    while there is data in the compressed buffer do:
    fetch a byte X from the compresses buffer
    if X<=0x0F, then copy the next X+1 bytes directly to the output buffer
    else, read the next byte Y, calculate the distance=(X and 0x0F) << 8 + Y, length= (X >> 4)+1, copy (distance, length) to the output buffer

    regards

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    may be lzrw1?

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    lzrw1 uses a 16 bit word to distinguish the next 16 symbols as either 1 byte literals or 2 byte match offset (12 bits) + length (4 bits, range 1..16). lzrw1-a changed the match length to the range 3..18.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    the same is here, except that it uses only 1 byte for encoding literal run

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    No, lzrw1 uses packed bit flags to distinguish literals from matches. So does LZSS, such as is used in NTFS disk compression, and also lz4. The one described is a pure byte-oriented LZ77 algorithm with no separate flag bytes. Newer compressors of this type like snappy work the same way but use larger windows and longer match codes.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    lz4 doesn't use bit flags

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I guess my description in LTCB was for an older version.

  8. #8
    Member
    Join Date
    Jan 2013
    Location
    on the moon
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thank you for your valuable answers. I see, there is no direct version of the compressor/decompressor pair available, nor frequently used. In order to get the proper compressor I do have some modifications on existing code. Where do you think I should start from: byte-oriented LZ77, LZSS, or LZRW1? Good would be a code without simple brute force search. Maybe you have a reference code sample?
    The compressor has to take care of the length of the un-coded literals, since the block length of un-coded literals must stay within 1 and 17 (4 Bit). So when there are more than 17 un-coded liters determined, the compressor has to insert an extra byte, indicating a new block of un-coded literals. The sliding window, ok 12 Bits, and the max matched copy length must not exceed 17 (4 Bit), same as LZ variant.

    thx





    Thx

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Actually, the algorithm you described is just a few lines of code. Here is something from the top of my head:
    Code:
    int p=0; // Buffer position
    
    while (1)
    {
      int x=getc(in); // Control byte
      if (x==EOF)
        break;
    
      if (x>=16) // Match - [Len,Offset] pair
      {
        int l=(x>>4)-1; // Len (-MIN_MATCH, since we use unrolled loop for MIN_MATCH)
        int s=p-((((x&15)<<8)|getc(in))+1); // Offset (Distance of a match)
        // Copy bytes (MIN_MATCH=2)
        buf[p++]=buf[s++];
        buf[p++]=buf[s++];
        while (l--)
          buf[p++]=buf[s++];
      }
      else // Literal run
      {
        buf[p++]=getc(in);
        while (x--)
          buf[p++]=getc(in);
      }
    
      if (p==BUF_SIZE) // Buffer full, flush it
      {
        fwrite(buf, 1, BUF_SIZE, out);
        p=0;
      }
    
    }
    
    if (p) // Flush buffer
      fwrite(buf, 1, p, out);

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    lzrw1 seems to be the closest one

  11. #11
    Member
    Join Date
    Jan 2013
    Location
    on the moon
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Actually, the algorithm you described is just a few lines of code. Here is something from the top of my head:
    Code:
    int p=0; // Buffer position
    
    while (1)
    {
      int x=getc(in); // Control byte
      if (x==EOF)
        break;
    
      if (x>=16) // Match - [Len,Offset] pair
      {
        int l=(x>>4)-1; // Len (-MIN_MATCH, since we use unrolled loop for MIN_MATCH)
        int s=p-((((x&15)<<8)|getc(in))+1); // Offset (Distance of a match)
        // Copy bytes (MIN_MATCH=2)
        buf[p++]=buf[s++];
        buf[p++]=buf[s++];
        while (l--)
          buf[p++]=buf[s++];
      }
      else // Literal run
      {
        buf[p++]=getc(in);
        while (x--)
          buf[p++]=getc(in);
      }
    
      if (p==BUF_SIZE) // Buffer full, flush it
      {
        fwrite(buf, 1, BUF_SIZE, out);
        p=0;
      }
    
    }
    
    if (p) // Flush buffer
      fwrite(buf, 1, p, out);
    .. right, the decompressor is simple, thx for the c-code anyway, but I am more interested in the compressor source code.

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The compressor could be written in lots of different ways giving different compression ratios that would be compatible with the decompresser. Do you have any compressed samples?

  13. #13
    Member
    Join Date
    Jan 2013
    Location
    on the moon
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Attached you will find a compressed sample. Thx! It is a binary file, just added the txt extension to comply with rules.
    Attached Files Attached Files

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Now all you need to do is decompress it, then write the compressor so that it produces an identical file.

  15. #15
    Member
    Join Date
    Jan 2013
    Location
    on the moon
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    .. as a final conclusion..

    Matt, I?d never thought about your suggestion

    Below you will find the modified ?spaghetti.code? of lzrw1. The hash indexing I did not tough, but the original compressor I am targeting on is about 10% better than hash based. The problem here is when the last copy literal is part of the copy item. So the hash algorithm will pay for one byte which could be saved.
    For me the results are fine and fit to the SystemSoft (Insyde MobilePRO) BIOS space (what is the reason I started for). But for the encode forum, I think this could be done better. Maybe you could improve.

    Thx for suggesting lzrw1!

    the "compressor"

     while (TRUE)
    {UBYTE *p,*s; UWORD control_bits=0,len,index; ULONG offset;

    begin_unrolled_loop:if (p_src>(p_src_post-ITEMMAX))
    {
    if (control_bits)
    // if literals in the pipe, copy before matched copy indexing
    {*p_dst++=control_bits-1;fast_copy(c_literals[0],p_dst,control_bits);p_dst+=control_bits;control_bits=0;}

    // last samples to avoid overflow
    len=p_src_post-p_src; while (len--) c_literals[control_bits++]=p_src++;
    *p_dst++=control_bits-1;fast_copy(c_literals[0],p_dst,control_bits);p_dst+=control_bits;control_bits=0;
    break;
    }
    index=((40543*((((p_src[0]<<4)^p_src[1])<<4)^p_src[2]))>>4) & 0xFFF;
    p=hash[index]; hash[index]=s=p_src; offset=s-p;
    if (offset>4095 || p<p_src_first || offset==0 || PS || PS || PS)
    {c_literals[control_bits++]=p_src++;}
    else
    {
    if (control_bits)
    // if literals in the pipe, copy before matched copy indexing
    {*p_dst++=control_bits-1;fast_copy(c_literals[0],p_dst,control_bits);p_dst+=control_bits;control_bits=0;}
    {PS || PS || PS || PS || PS || PS || PS ||
    PS || PS || PS || PS || PS || PS || s++; len=s-p_src-1;
    *p_dst++=((len-1)<<4)+((offset&0xF00)>>8); *p_dst++=offset&0xFF; p_src+=len;}
    }
    if (control_bits<ITEMMAX) goto begin_unrolled_loop;
    // literals block full, store an proceed
    {*p_dst++=control_bits-1;fast_copy(c_literals[0],p_dst,control_bits);p_dst+=control_bits;}
    }
    *p_dst_len=p_dst-p_dst_first;
    return;


    the "decompressor"

    {UBYTE *p_src=p_src_first, *p_dst=p_dst_first, b0, len,
    *p_src_post=p_src_first+src_len;
    while (p_src!=p_src_post)
    {
    b0=*p_src++;
    if (b0<=0x0F)
    {len=b0+1; while (len--) *p_dst++=*p_src++;}
    else
    {UWORD offset; UBYTE *p;
    offset=((b0&0x0F)<<8)+*p_src++; len=1+(b0>>4);p=p_dst-offset;
    while (len--) *p_dst++=*p++;}
    }
    *p_dst_len=p_dst-p_dst_first;
    }

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Don't forget to define the macros like PS. Original code here: http://www.ross.net/compression/lzrw1.html

  17. #17
    Member
    Join Date
    Jan 2017
    Location
    germany
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi there and forgive me this necro,
    (and if you think it would be better to start a new thread instead of digging in here, please move this thread or close it
    and I will start a new thread)


    but Im on an adventure since 2012, trying to find the right compression/decompression routine for a SystemSoft (Insyde MobilePRO) BIOS by Insyde.
    Its version 4.20.10 and can be found on various laptops around the world, mostly on laptops form the year 2005.

    I re-started a collection of information and what I have discovered so far on the MDL forum :
    (please also forgive me re-dericting to another tech forum, if thats not allowed)
    https://forums.mydigitallife.info/th...18#post1305618

    Im still comparing various BIOS files that I found and still want to unpack this BIOS, but
    it seems I reached the end of the road (once again).

    Now and then I discover "new" hints, like this thread here, which restores some hope.
    Even if the guy who mentioned his MobilePRO BIOS (seppi0815) is no longer active here.

    In short :

    -I have a Acer Aspire 9500 Laptop with a Insyde MobilePRO BIOS 4.20.10 that probably uses some derivative of LZMA/LZH/LZ77
    There is so much clear text when looking at it with a hexeditor, that probably only parts are compressed...but Im not sure...as Im not a coder.
    (I only want to change the splashscreen for now, which is usually called "IMAG001" and probably an RLE compressed PCX graphic format)

    -If you want to download my BIOS and arent interested in browsing through my thread on the other forum, you can find it here :
    ftp://ftp.support.acer-euro.com/../A...ire_9500/bios/


    -There are additional files that came with my BIOS, like ROM location table information or a PKLite packed flasher that I could learn from too
    as its probably initializing the uncompression routine.

    -I tried many, many (DOS) unpackers on the BIOS file already, from LZMA/LZH/LHARC and LRZIP to UHARC and hexediting but it all more or less failed, probably beause of a "changed/strange file header"
    (Im aware some of these "evil" compression methods dont even have "magic bytes" aka. a "known file header" that you can search for. But again, Im not a coder)

    -I was able to find a nearly identical BIOS file from another laptop manufacturer,
    with probably the very same compression, that I cant unpack either.

    -I was able to find a nearly identical BIOS file, with probably the very same compression, that also has 4 times the RLE PCX file header,
    but these graphics cant b viewed in Photoshop as they are probably still compressed in some way

    -I was also able to find a nearly identical BIOS file, which does not use any compression and can be edited.
    Like the RLE compressed PCX files with photoshop or the option ROMs with other tools.
    Comparing this uncompressed file to my compressed file will be the next step.


    In all BIOS files you can find the same words, the exact same BootBlock Version at the end of the file etc. etc. etc.

    I guess this should be enough for now and I appreciate any reply and/or info regarding this matter.

    Highest regards,
    Mario
    Last edited by itsmemario; 4th January 2017 at 01:10.

  18. #18
    Member
    Join Date
    Jan 2017
    Location
    germany
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I found a BIOS of an "Fujitsu Siemens Amilo L1300".

    The example compression by seppi0815 does nearly exactly start like it in a hex editor. 0C 55 AA but without the 0C.

    Now it would be nice to know which laptop he was using in 2013...*sigh*

    I guess this is another dead end ^^

Similar Threads

  1. Replies: 30
    Last Post: 2nd October 2018, 14:52
  2. Replies: 109
    Last Post: 29th August 2016, 20:40
  3. LZSS v0.01 is here!
    By encode in forum Data Compression
    Replies: 67
    Last Post: 28th March 2012, 10:10
  4. lzp question
    By sourena in forum Data Compression
    Replies: 4
    Last Post: 5th February 2012, 18:24
  5. LZSS with a large dictionary
    By encode in forum Data Compression
    Replies: 31
    Last Post: 31st July 2008, 21:15

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •