Results 1 to 6 of 6

Thread: Identify compression (pre 1996)

  1. #1
    Member
    Join Date
    Oct 2015
    Location
    N/A
    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Identify compression (pre 1996)

    Hi again,

    Looking to figure out what compression this file is using. Compressed txt file is 976 bytes, uncompressed should be 1931 bytes.

    File is from 1996, so obviously using an older compression algorithm.

    File has the number '3' (0x03) associated with it (unsure if this is related to compression).

    Click image for larger version. 

Name:	GLOBESCN.TXT.png 
Views:	262 
Size:	85.2 KB 
ID:	4176

    If nobody knows the compression, can someone tell me about it? Is it RLE based?

    Thanks
    Attached Files Attached Files

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    This looks like a LZSS variant. The file is divided into blocks, each blocks begins with a byte with 8 flags that signal if there is uncompressed (bit == 1) or compressed (bit == 0) data. The most simple case can be seen in the first 9 bytes:

    Code:
    FF 0D 0A 72 61 77 20 54 48
    Here, the "flag bit" is 0xFF = 11111111b, so every of the following 8 bytes is uncompressed and can just be copied in the decompression stage. Another uncompressed block follows right after that.

    But then, there is:

    Code:
    FC 01 00 01 00 73 70 72 20 47 4C
    The flag bit is 0xFC = 0x11111100, so there are 6 uncompressed bytes (the last ones) and two compressed data entities (the "01 00" pairs). Compressed entities usually take 2 bytes in most LZSS variants and usually encode a run length/offset pair that points to a previously decompressed string in the sliding window (usually 4K in size, most certainly doesn't matter here as it's only 2K of decompressed data).

    So once you find out how the compressed 2 byte pairs are encoded (could be 4 bit run length, 12 bit offset), you'll be able to decompress this.

    Some pointers on algorithm variants and implementations that could help here: Bohemia Interactive Wiki: Compressed LZSS File Format.

    EDIT: An even better description including pseudo code is here: LZSS - XentaxWiki
    http://schnaader.info
    Damn kids. They're all alike.

  3. The Following User Says Thank You to schnaader For This Useful Post:

    McGee (18th March 2016)

  4. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    It could be LZRW1 or LZRW1-A, which were written in 1991. They both use a 12 bit offset and 4 bit length, but A uses lengths 3..18 instead of 1..16. http://www.ross.net/compression/lzrw1a.html

    NTFS compression uses a variant where the fewer bits are used for the offset at the beginning of the block, allowing more bits to be used to encode the match length. I can only find the technical details in Russian in an obsolete character set. http://mattmahoney.net/dc/text.html#6368

  5. #4
    Member
    Join Date
    Feb 2013
    Location
    Internet
    Posts
    4
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by schnaader View Post
    So once you find out how the compressed 2 byte pairs are encoded (could be 4 bit run length, 12 bit offset), you'll be able to decompress this.
    It looks like match lengths are encoded in top 5 bits of the second byte of 2-byte-pair, and min. match length equals 2.
    If you extract top 5 bits of each pair, increment them by 2 and sum all match lengths with literal count, you will get expected 1931 bytes of output.

  6. The Following User Says Thank You to zombie28 For This Useful Post:

    McGee (29th March 2016)

  7. #5
    Member
    Join Date
    Oct 2015
    Location
    N/A
    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Massive thanks everyone!

    It seems the file is using a variant of LZSS. With your detailed explanation and references I have wrote a simple decompresser but couldn’t get the correct values for the run length/offset. zombie28's suggestion sounds very promising and I'll give it a go tomorrow.

    I always find it hard to identify a compression method, do you have any tips? I use some tools such as file and binstalk but both usually don't return anything useful. I guess it's a case of reading up on compression methods and gaining plenty experience.

  8. #6
    Member
    Join Date
    Feb 2014
    Location
    Belgium
    Posts
    2
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quickbms (http://aluigi.altervista.org/quickbms.htm) has a script to brute force lots of compression algorithms.

  9. The Following User Says Thank You to pzo For This Useful Post:

    schnaader (18th March 2016)

Similar Threads

  1. Can anyone identify this compression/encryption tool?
    By Mexxi in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 9th August 2014, 16:52
  2. Help identify compression algorithm?
    By DotDotDot in forum Data Compression
    Replies: 0
    Last Post: 1st June 2013, 09:15
  3. Identify compression in file fragment
    By Nquisitive in forum Data Compression
    Replies: 0
    Last Post: 16th September 2012, 17:02

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •