Results 1 to 4 of 4

Thread: LZX recompression - proof of concept

  1. #1
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts

    LZX recompression - proof of concept

    In Surfer's ISO thread, I speculated about LZX recompression. After a bit of playing with FEPInstall.exe from the ISO and CabArc, I came up with a little proof of concept:

    Code:
    FEPInstall.exe      19.577.576 bytes
    FEPInstall\*.*      31.211.856 bytes, 147 files           // Extracted using 7-Zip
      FEPInstall.7z_store.srep.7z_max: 18.574.538 bytes       // Demonstrating we can really save bytes if we have access to the decompressed data
    FEPInstall.cab      19.282.819 bytes                      // compressed using "CabArc -r -m LZX:21 n FEPInstall.cab FEPInstall\*.*"
    FEPInstall.cab.srep 19.282.919 bytes                      // SREP can neither compress this file ...
    FEPInstall.exe.srep 19.576.110 bytes                      // ... nor that one
    FEP_concat          38.860.395 bytes                      // Concatenation of FEPInstall.exe and FEPInstall.cab
    FEP_concat.srep     27.347.677 bytes                      // Only 40% larger than FEPInstall.exe.srep, so we have some matches
    This one isn't very good, the +40% from SREP mean that we can not reconstruct the whole original file (if so, it would be 0%). Additionally, we can only save around 5% on the decompressed data, so we'll end up with a larger file. But it demonstrates that what CabArc creates will be very close to the original, even though we used 7-Zip to decompress.

    Recursion could help a bit, though - there are many LZX compressed files inside FEPInstall, decompressing all of them leads to 67,5 MB of data in 5588 files (!) that can be compressed to 15 MB using SREP and 7-Zip Maximum (which is 25% compression, still not the 40%+ we'd need, but closer).

    But here's a better one, a CAB file from inside FEPInstall.exe:

    Code:
    epploc.cab              31.397 bytes
    epploc\*.*             126.656 bytes                      // this time we can use CabArc.exe for this step
      epploc.7z_store.7z_normal: 22.772 bytes
      epploc.7z_store.paq8o8_3:  19.799 bytes
      epploc.7z_store.paq8o8_4:  18.851 bytes                 // and we can save up to 40% when using the decompressed data
    epploc.cabarc           24.445 bytes                      // "CabArc -r -m LZX:21 n epploc.cabarc epploc\*.*"
    epploc.cabarc.srep      24.489 bytes
    epploc.cab.srep         31.441 bytes
    epploc_concat           55.842 bytes
    epploc.concat.srep      31.494 bytes                      // Only 0,1% larger than epploc.cab.srep, so we can reconstruct almost the whole original CAB
    Of course this is very far from being a complete LZX recompressor, but it shows the potential and possibilities. Theoretically, it should also be possible to process CHM files this way, but I had no success with those so far.

    The real challenge is to put all this together and to handle the license situation - I'm quite sure CabArc can't be used without Microsoft's permission and I don't know about free LZX compression tools or open source libraries (there's a specification of CAB/LZX that could be useful). Also, if not using CabArc, reconstruction of the original behaviour (and thus the original stream) is very uncertain.
    Last edited by schnaader; 22nd March 2011 at 07:12.
    http://schnaader.info
    Damn kids. They're all alike.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

  3. #3
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    301
    Thanks
    26
    Thanked 22 Times in 15 Posts
    I've run the package through Precomp and yes there is PNG streams (only 14) but i need to unpack all MSI (but some skipped as no CAB),CAB,MSU & CHM (chm files has JPG, PNG & GIF inside them) and after doing this i've reached 61,198,671 byte and after precomp 0.41 its 65,676,286 bytes then with 7-zip at 64m i got 12,796,695 bytes (its around 33%).. there is a future..

  4. #4
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    301
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Off Topic - I've been playing with chm file that has SWF,PNG,GIF and it seems that leaving every thing uncompressed would reduce size of generated CHM file with the assistance of Keytools that will apply LZX-18 on.

    SWF in CHM 40,805,741 bytes

    original 43,844,993 bytes
    optimized 40,641,369 bytes

    Further more:
    1- uncompressed PNG files converted from jpeg will reduce CHM size.
    2- JPEG huffman optimized (only) can reduce CHM size but not Progressive.
    Last edited by maadjordan; 2nd April 2011 at 11:52. Reason: more findings

Similar Threads

  1. Format priority for recompression
    By Shelwien in forum Data Compression
    Replies: 22
    Last Post: 12th March 2011, 00:35
  2. filesharing with built-in recompression
    By Shelwien in forum Data Compression
    Replies: 8
    Last Post: 8th December 2009, 13:42
  3. Winzip v12.0 with JPG recompression & 7z support
    By maadjordan in forum Data Compression
    Replies: 3
    Last Post: 12th September 2008, 23:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •