Page 2 of 2 FirstFirst 12
Results 31 to 48 of 48

Thread: Cracking an old MS-DOS game's compressed file.

  1. #31
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    Done, see attachment. Also, the endianness is little-endian, not big.
    Okay. I will have a look soon. Little endian? The rest of the program uses big endians. (EDIT: correction it does not) While this is the DOS version it was originally for the Apple IIgs. I will have to check. Thanks for pointing this out.

    EDIT:
    Indeed, little endian. And the decompressor works wonderfully.
    Last edited by Peter Swinkels; 22nd July 2018 at 12:42.

  2. #32
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    Done, see attachment. Also, the endianness is little-endian, not big.
    @mpais:
    Hi, I need a little help with the code you posted:

    Code:
    int dump( int d ) {
      char b[4096]={0};
      int n=4095;
      while (d>256 && n>=0){
        b[n]=lzw[d][1];
        n--;
        d=lzw[d][0];
      }
      b[n]=d;
      while (n<4096){
        putc(b[n], g);
        n++;
      }
      return d;
    }
    -What type of values can the "d" parameter have when it is sent to the "dump" procedure?
    -And when it is returned? What is the "b" array?
    -What do 4095 and 4096 refer to?

    Thanks.

  3. #33
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > What type of values can the "d" parameter have when it is sent to the "dump" procedure?

    Same as explained before, d is LZW symbol.
    d=0..255 is a normal data byte
    d=258..4095 is an extended LZW symbol, which is constructed by attaching a byte to a previous symbol

    > And when it is returned?

    First symbol of a byte string corresponding to LZW symbol d is returned

    > What is the "b" array?

    That array is used to unpack symbol d to byte string (it has to be done backwards because of symbol creation algorithm).

    > What do 4095 and 4096 refer to?

    4096=1<<12 is the limit of LZW alphabet in this algorithm, because it doesn't use more than 12 bits for codes.
    You can only store numbers 0..4095 in 12 bits.
    The algorithm also constructs a new symbol by attaching a single byte to already existing symbol,
    so code 0x102 unpacks to 2 bytes, 0x103 to 3 bytes, 0xFFF to 3839 bytes,
    but mpais reserved full 4096 bytes for unpacked symbol, so in dump(), 4096 is the size of b array
    and 4095 is the last index in it.

  4. The Following User Says Thank You to Shelwien For This Useful Post:

    Peter Swinkels (22nd July 2018)

  5. #34
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    > The algorithm also constructs a new symbol by attaching a single byte to already existing symbol,
    so code 0x102 unpacks to 2 bytes, 0x103 to 3 bytes, 0xFFF to 3839 bytes,

    You're talking about the most extreme case with a single byte run in entire file? In LZW you don't have to attach symbol to longest existing dictionary entry, so maximum entry length doesn't have to grow linearly.

  6. #35
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I was talking about buffer size for unpacking a symbol.

  7. #36
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Hi,

    During the last month I wrote a vb.net translation for the C decompression code kindly provided by other forum users, and have tried to write a compression algorithm to go with it. With partial success, small chunks of data compress and decompress fine, anything else fails. The compressor can be downloaded here: https://drive.google.com/open?id=11v...xBdry2KAuzw3lf. It's part of a small vb.net project that contains a few other things as well. All relevant compression code is contained in a single module. A ready to use/test executable (LZW.EXE) can be found in the ".\bin" folder.

    Further details concerning the decompressor and what it's for can be found in this thread's previous posts.

    So my question is:
    Can any one help me get the compression to work together with decompression code?

    I didn't provide much details, so if anyone reading this has questions: just ask and I'll do my best to answer.

    Thanks.

  8. #37
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Hi

    Sorry, but I don't have time atm to write a compressor for you, and your VB code doesn't help, because I don't use VB.

    But making a compressor in this case should be reasonably simple - just match following string with LZW symbols, keep longest match.
    It should be also possible to reproduce the same compression algorithm that was originally used - it should help with debug too,
    because if your output is different, you can check what you do at this step.

    As to your problem with "small chunks of data", I think the likely issue is bitsize mismatch.
    _After_ adding a new symbol, you should check next symbol index, and if its >=(1<<bitsize), increase bitsize.
    Of course, there'd be a decoding error soon enough, if you don't follow decoding logic and switch bitsize differently.
    1. after encoding a symbol, a new symbol should be always created
    2. when symbol index reaches (1<<bitsize), increase bitsize, but only if symbol index is <4096.

  9. The Following User Says Thank You to Shelwien For This Useful Post:

    Peter Swinkels (24th August 2018)

  10. #38
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    I don't use VB, so I didn't check your code, but here's a C++ version with compression support. It creates a 10 byte dummy header, like on IEA files.
    Even with working compression code, how do you know the software will accept your modified files? For all we know, the header contains some sort of checksum that you don't know how to calculate.
    Attached Files Attached Files

  11. The Following 2 Users Say Thank You to mpais For This Useful Post:

    Peter Swinkels (28th August 2018),Shelwien (23rd August 2018)

  12. #39
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Hi

    Sorry, but I don't have time atm to write a compressor for you, and your VB code doesn't help, because I don't use VB.

    But making a compressor in this case should be reasonably simple - just match following string with LZW symbols, keep longest match.
    It should be also possible to reproduce the same compression algorithm that was originally used - it should help with debug too,
    because if your output is different, you can check what you do at this step.

    As to your problem with "small chunks of data", I think the likely issue is bitsize mismatch.
    _After_ adding a new symbol, you should check next symbol index, and if its >=(1<<bitsize), increase bitsize.
    Of course, there'd be a decoding error soon enough, if you don't follow decoding logic and switch bitsize differently.
    1. after encoding a symbol, a new symbol should be always created
    2. when symbol index reaches (1<<bitsize), increase bitsize, but only if symbol index is <4096.
    Hi, I can understand that and I was just about to start converting my project to C when I read mpais' reply. Yeah, making a compressor shouldn't be that difficult and your suggestions make sense. Well, I'm going to have a look at mpais' compressor. Thanks anyway.

  13. #40
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    I don't use VB, so I didn't check your code, but here's a C++ version with compression support. It creates a 10 byte dummy header, like on IEA files.
    Even with working compression code, how do you know the software will accept your modified files? For all we know, the header contains some sort of checksum that you don't know how to calculate.
    Hi mpais, that's quite alright. My C++ isn't great but I should be able to figure out how your code works. As for the 10 byte header: I have absolutely no idea as to why it is even there. I purposefully mangled it and ran the installer just to see what would happen. As near as I can tell it does nothing at all, as long as there are 10 bytes with no purpose things are fine. Otherwise not. Probably has to to with data alignment in memory or perhaps it's there to confuse would be software crackers. It could possibly be a timestamp that is otherwise ignored. I dunno. Anyway thanks for the compressor! I'm going to have look right now.

  14. #41
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    I don't use VB, so I didn't check your code, but here's a C++ version with compression support. It creates a 10 byte dummy header, like on IEA files.
    Even with working compression code, how do you know the software will accept your modified files? For all we know, the header contains some sort of checksum that you don't know how to calculate.
    First of, thanks for the compressor. A few questions:

    1. What does the "table" array do?
    2. Why is it set to a size of 9221 (hashsize), what is the reason for this particular value?

  15. #42
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    I'm using a hash table to speed up the dictionary searches. The strange constant (9221) is a prime number that is over 2x as large as the maximum number of dictionary entries (4096), it is used to ensure the step size in probing is always coprime to the table size (see here) and that the load factor stays below 50%.

    I guess you won't need that in VB.NET, but it was a simple enough little optimization. The hash function is probably no good (made it up on the spot), but I didn't want to confuse you with any more weird constants, and it still seems fast enough (about 28MB/s compressing enwik8 here). The rest of the code should be pretty self-explanatory, LZW is a very simple algorithm.

  16. The Following User Says Thank You to mpais For This Useful Post:

    Peter Swinkels (27th August 2018)

  17. #43
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Hi MPais,

    Thank you for the explanation. I have managed to extract the compression code from the code you provided and reduced it to its bare minimum (e.g., removed error handling, something that I will reimplement eventually. Also the ten byte header stuff has been removed as it is not relevant to the compression/decompression stuff.)

    This is what I have so far:
    Code:
    #include "stdafx.h"
    #include <stdio.h>
    #include <string.h>
    #include <stdint.h>
    
    
    #define EOF_CODE     0x101
    #define HASH_SIZE   0x2405
    #define RESET_CODE   0x100
    #define SYMBOL_BASE  0x102
    
    
    struct LZWEntry {
    	int16_t Prefix;
    	int16_t Suffix;
    };
    
    
    uint8_t Buffer[0x1000];
    int32_t DictionaryIndex = 0x0;
    LZWEntry DictionaryO[0x1000];
    int16_t Table[HASH_SIZE];
    
    
    void AddEntry(const int32_t Prefix, const int32_t Suffix, const int32_t Offset = -1) {
    	if (Prefix == -1 || Prefix >= DictionaryIndex || DictionaryIndex >= 0x1000 || Offset >= 0x0) return;
    	DictionaryO[DictionaryIndex].Prefix = Prefix;
    	DictionaryO[DictionaryIndex].Suffix = Suffix;
    	Table[-Offset - 0x1] = DictionaryIndex;
    	if (DictionaryIndex < 0x1000) DictionaryIndex += 0x1;
    }
    int32_t FindEntry(const int32_t Prefix, const int32_t Suffix) {
    	int32_t i = (Prefix & 0x3FF) + (Suffix << 0x4) + ((Suffix & 0xF) << 0x8);
    	int32_t Offset = (i > 0x0) ? HASH_SIZE - i : 0x1;
    	while (true) {
    		if (Table[i] < 0x0) { //FREE_SLOT?
    			return -i - 0x1;
    		} else if (DictionaryO[Table[i]].Prefix == Prefix && DictionaryO[Table[i]].Suffix == Suffix) {//DESIRED_ENTRY?
    			return Table[i];
    		}
    		i -= Offset;
    		if (i < 0x0) i += HASH_SIZE;
    
    
    		//return ...; // DEFAULT_RETURN_VALUE?
    	}
    }
    void Reset() {
    	memset(&DictionaryO, -1, sizeof(DictionaryO));
    	memset(&Table, -1, sizeof(Table));
    	for (int32_t i = 0x0; i < RESET_CODE; i++) {
    		Table[-FindEntry(-1, i) - 1] = (int16_t)i;
    		DictionaryO[i].Suffix = i;
    	}
    	DictionaryIndex = SYMBOL_BASE;
    }
    
    
    void WriteCode(FILE *Output, int32_t *Buffer, int32_t *BitsUsed, int32_t BitsPerCode, int32_t Code) {
    	(*Buffer) = (*Buffer) | (Code << (*BitsUsed));
    	(*BitsUsed) += BitsPerCode;
    	while ((*BitsUsed) > 0x7) {
    		putc((*Buffer) & 0xFF, Output);
    		(*Buffer) = (*Buffer) >> 0x8;
    		(*BitsUsed) -= 0x8;
    	}
    }
    
    
    int main(int argc, char** argv) {
    	int32_t BitsPerCode = 0x9;
    	int32_t BitsUsed = 0x0;
    	int32_t Buffer = 0x0;
    	int32_t Code = 0x0;
    	int32_t FoundIndex = -1;
    	FILE *Input = NULL;
    	FILE *Output = NULL;
    	int32_t Parent = -1;
    		
    	printf("Compressing to file: %s\n", argv[2]);
    	
    	Reset();
    
    
    	fopen_s(&Input, argv[1], "rb");
    	fopen_s(&Output, argv[2], "wb");
    	fseek(Input, 0, SEEK_SET);
    		
    	WriteCode(Output, &Buffer, &BitsUsed, BitsPerCode, RESET_CODE);
    	while (true) {
    		Code = getc(Input);
    		if (Code < 0x0) break;
    
    
    		FoundIndex = FindEntry(Parent, Code);
    		if (FoundIndex < 0x0) { // NO_ENTRY
    			WriteCode(Output, &Buffer, &BitsUsed, BitsPerCode, Parent);
    			if (DictionaryIndex >= 0x1000) {
    				WriteCode(Output, &Buffer, &BitsUsed, BitsPerCode, RESET_CODE);
    				Reset();
    				BitsPerCode = 0x9;
    			} else {
    				AddEntry(Parent, Code, FoundIndex);
    				if (DictionaryIndex > (0x1 << BitsPerCode)) BitsPerCode += 0x1;
    			}
    			Parent = Code;
    		} else {
    			Parent = FoundIndex;
    		}
    	}
    	if (Parent >= 0x0) WriteCode(Output, &Buffer, &BitsUsed, BitsPerCode, Parent);
    	WriteCode(Output, &Buffer, &BitsUsed, BitsPerCode, EOF_CODE);
    	if (BitsUsed > 0x0) putc(Buffer & 0xFF, Output); // FLUSH_BUFFER
    	fclose(Input);
    	fclose(Output);
    
    
    	return 0x0;
    }
    I also made some minor adjustments that should make converting this to vb.net easier. As far I can tell the above code produces valid compressed data which I can decompress without issues. If I understand you correctly the the Table related code can be removed, right?

    EDIT:
    I see the "ReturnValue" function can exit without explicitly returning a value. What should this value be?
    Last edited by Peter Swinkels; 27th August 2018 at 14:59. Reason: Added a question.

  18. #44
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Peter Swinkels View Post
    I also made some minor adjustments that should make converting this to vb.net easier. As far I can tell the above code produces valid compressed data which I can decompress without issues. If I understand you correctly the the Table related code can be removed, right?
    I'm not familiar with VB.NET, but supposing you have some sort of generic dictionary class designed for fast retrieval, then you can probably just use that. Internally it will probably handle the hashing itself.

    Quote Originally Posted by Peter Swinkels View Post
    I see the "ReturnValue" function can exit without explicitly returning a value. What should this value be?
    Do you mean the FindEntry() function? That function will always return a value, the loop either ends when the entry we requested is found, or when a free position in the dictionary is found for that entry.

  19. #45
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    I'm not familiar with VB.NET, but supposing you have some sort of generic dictionary class designed for fast retrieval, then you can probably just use that. Internally it will probably handle the hashing itself.


    Do you mean the FindEntry() function? That function will always return a value, the loop either ends when the entry we requested is found, or when a free position in the dictionary is found for that entry.
    My mistake, I checked: it does always return a value.



    Yes, Vb.net uses the .Net framework which provides a generic dictionary. Here is an example in C# which also uses the .Net framework but much more similar to C++ in syntax:

    Code:
    using System;
    using System.Collections.Generic;
    
    
    namespace ConsoleApp1
    {
       class Program
       {
          static void Main(string[] args)
          {
             Dictionary<string, string> Example = new Dictionary<string, string>();
    
    
             Example.Add("1", "one");
             Example.Add("2", "two");
             Example.Add("3", "three");
    
    
             Console.WriteLine(Example["2"]); //Output "two".
          }
       }
    }

  20. #46
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Hi,

    I got the compressor and decompressor working: https://drive.google.com/open?id=1-k...JU0nxuR3GGd4Uk.

    Thanks everyone.

  21. #47
    Member
    Join Date
    Jul 2018
    Location
    Netherlands
    Posts
    37
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mpais View Post
    Done, see attachment. Also, the endianness is little-endian, not big.
    mpais, do you still have the C++ code for decompressing *.pea files?

  22. #48
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    The version I posted above (attachment lzw.zip) does decompression also, both for PEA and IEA files.

  23. The Following User Says Thank You to mpais For This Useful Post:

    Peter Swinkels (30th August 2018)

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Help on identifying DOS file encryption/packer
    By theruler in forum Data Compression
    Replies: 4
    Last Post: 15th June 2017, 17:53
  2. Finding custom lzss on arcade game .dat file
    By finalscream in forum Data Compression
    Replies: 3
    Last Post: 11th June 2017, 03:40
  3. Help on an old dos PAK file
    By theruler in forum Data Compression
    Replies: 2
    Last Post: 23rd January 2017, 11:02
  4. Help on compressed file from OLD DOS game
    By theruler in forum Data Compression
    Replies: 3
    Last Post: 16th August 2015, 12:18
  5. Replies: 6
    Last Post: 24th April 2012, 13:50

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •