Results 1 to 5 of 5

Thread: identification/reverse engineer of possible lz compression

  1. #1
    Member
    Join Date
    Feb 2014
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    identification/reverse engineer of possible lz compression

    I am doing a translation project for the PSP version of a game released by Prototype (Japanese company), but I am having trouble with some GIM files (image files).
    Now the actual problem is not with the gim format, but a compression that has been placed on the gim files, but before that I will clarify a few thins.
    Some of the GIM's work, however sometimes a GIM file appears that neither puyotools or GimConv (software that converts gim to png) can handle. The GIM that doesn't work is a little different in appearance.
    I know its a GIM file because it starts with: MIG.00.1PSP, though to be exact, its a little different and written like this:

    [integer equals 16, signature?] [integer equals 131792] [MIG.00.1PSP, but where a 00 HEX is placed between each hex byte]


    like this:


    10 00 00 00 D0 02 02 00 4D 00 49 00 47 00 2E 00 30 00 30 00
    2E 00 31 00 50 00 53 00 50 00 00 00



    Each of the compressed GIM files starts with these two integers values (however an image I have with smaller resolution has a different second integer). I have allready tried removing these two integers and also tried replacing M(00)I(00)G(00).(00)0(00)0(00).(00)1(00)P(00)S(00) P(00), with simply MIG.00.1PSP, but that just ended up making GIMConv saying: wrong chunk data.


    Also I have tried analyzing the file with signsrch and TrID to look for hints of some sort, but signsrch finds nothing and TrID only finds: "100 .0% (.) LTAC compressed audio (v1.61) (1001/2)"


    Here is the file called to keep the expected output simple: black.gim





    Here is a random CG:


    Here is a random gim file for refference to how an uncompressed version should look like. Notice that the first 4 bytes of the second line indicates the file size minus 16.
    Another thing is the int after MIG.001.PSP, which I from different sources has found to be the version number. Therefore, all the compressed files should problably get that int there too.
    Update: I believe this is some kind of lz compression, but I haven't figured out which one yet. Tried lz01,lz00,lz10,lz11,CXLZ, lzss . It seems to me that it begins with a 10 byte like lz, it makes MIG.001.PSP become seperated by 00, due to the compression relying on value, key pair, where I believe the key 0 means that values should be directly send to the output. <- if you are confident that its one of the compressions I have tried, please say so too as it could very well just be the tools I used to try those compressions that was wrong with. GZIP and deflate was tried using .NET's System.IO.Compression in C# and the others has been tried using something called Puyo tools.


    Steps for my current algorithm that doesn't work after having found the version number:
    Start at byte 9.


    1. Get next 2 bytes, we'll call the first byte value and the second byte key.


    2. Is Key equal to zero?


    - write value to decompressed output
    - go to 1).


    3. Is Key greater than zero?


    - Remember the value of (the short int16 value of Value, Key - 1) as lookupValue
    - look at byte 8 + (lookupValue - 1) * 2
    - while the bytepair at lookupValue has a key higher than zero, do 3) on that too.
    - when 2) has been found, write the value of that to output and each lookup shall extend the output by the same value. In other words, 00 00 01 01 outputs 00 00 00.


    4. Go to 1. if end of file hasn't been reached.


    However this only works till the end of version number. The size of file is much larger than filesize minus 16 when I tried my algorithm on black.gim.. however I hope this pseudo code despite being the wrong solution give some ideas.

    I'll be really happy if anybody could give me some input or lead
    Last edited by patr0805; 27th February 2014 at 03:17.

  2. #2
    Member
    Join Date
    Feb 2014
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hmm.. it seams like I have it almost figured out, basicly a key equals zero outputs value to decompressed data. If key is higher than zero, then get the short value of [Value,key-1]. This short plus 8 times two gives the byte it has to write out twice to decompressed data. In other words, 00 00 08 01 would output 00 00 00. The only problem with this is that 0f 01 in my black.gim example at line 3. This would point to 15 which would be position (15 + *2 equals byte 46 which should be 02 00 in line two. This is however incorrect! Since I expect it to place zeroes there, not output 02 02 to decompressed data.

    In short in the black.gim example I have found that:
    0C 01 should output 00 00
    0D 01 should output 00 00 00
    0F 01 should output 00 00 (or more zeroes)


    Any suggestions?
    Last edited by patr0805; 27th February 2014 at 07:49.

  3. #3
    Member
    Join Date
    Feb 2014
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I finally reversed engineered this!
    Here is the complete answer to everyone who may encounter compressed GIM file of simular compression. Basicly the file starts like this: [magic number 10 00 00 00] [Integer with uncompressed size of file] After this the compressed file begin. The compression basicly functions like this: (in terms of decompression)

    -> Take the next 2 bytes.
    -> Is the second byte equals zero?

    • Write first byte to decompressed output.

    -> Is the second byte higher than 0?

    • This is a pointer whose job is to make use of bytes used before. The position it points to is equals the unsigned short value of: [first byte, second byte minus 1]*2 + 8. When a pointer reads at the position it points to, it will read the next 4 bytes and not just the next 2 bytes. If the bytes at the pointed location is: 00 02 0C 01, then the decompressed output would be 02 ?? where ?? would be the result of the first two bytes of what its pointing at. In other words, if we pointed to 0C 01 02 00, then the output 0C 01 would be replaced by the result whatever its pointing to.Lets say it points to 08 00 00 00, then the output of the last pointer would be 08 00 00 00, which would replace 0C 01 and become: 08 00 00 00 02 00, which lastly would output 08 00 02 to decompressed output. *Notice that a pointer placed as the second byte cannot be replaced by four bytes, but only by the first two bytes of what would normally have been the result. If the second byte is pointing to the first byte, then it will simply be given the result of the first byte.
    • Examples from the image from the first post (Black.gim):
    • In the first image: 0c 01 points to 00 00 0C 01, which outputs 00 00.
    • In the first image 0d 01 points to 0C 01 0C 01, which outputs 00 00 00. <- notice how only the first two bytes at a pointed position has the right to extend the result of what pointed to it by two bytes.
    • In the first image: 0F 01 points to 02 00 0D 01, which outputs 02 00

    -> do this until no bytes remains..
    Last edited by patr0805; 3rd March 2014 at 01:00.

  4. #4
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    854
    Thanks
    45
    Thanked 104 Times in 82 Posts
    just out of cruriosity.

    what happens if the byte is 1? or was that never used ?

  5. #5
    Member
    Join Date
    Feb 2014
    Location
    Denmark
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    That was a typo sorry (just corrected it, thanks). I meant to say that if the second byte is >0, then its a pointer following the logic in my last reply.

    *I just added some examples to the pointer discription, should make it more understandable.
    Last edited by patr0805; 3rd March 2014 at 00:53.

Similar Threads

  1. Compression algorithm identification
    By igorsk in forum Data Compression
    Replies: 9
    Last Post: 26th April 2014, 21:26
  2. Blackbox identification of compression engine
    By Luntik in forum Data Compression
    Replies: 6
    Last Post: 19th January 2013, 20:57
  3. File "Type" identification tool
    By soor in forum Data Compression
    Replies: 4
    Last Post: 6th June 2011, 04:04

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •