Results 1 to 7 of 7

Thread: Compression of partial BASE64 encoded files?

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    24
    Thanks
    4
    Thanked 4 Times in 4 Posts

    Compression of partial BASE64 encoded files?

    I would like to compress and archive thousands of E-Mails in .eml-Format every month. Most of the data is encoded in BASE64 within the .eml-files.

    Is there a strong, stable archiver that can decode and encode BASE64 on the fly?

    I would prefer strong, maybe asymmetric compression, like 7-zip with large dictionary and I need safe encryption. Compression can be very slow and decompression can be slow.

    I tried precomp on each file to decode the BASE64. Generally it worked but I had some corruption with the closed-source-precomp-version. I have not tried newer open-source-versions yet.

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by Sebastian W View Post
    I tried precomp on each file to decode the BASE64. Generally it worked but I had some corruption with the closed-source-precomp-version. I have not tried newer open-source-versions yet.
    That was a wise decision The open source versions didn't change anything in BASE64 behaviour yet, so trying them would indeed make no sense at the moment.

    It would be nice if you could upload/send me files where corruption occurs so I can analyse it, create a bug and fix it.

    To adress the actual question: I don't know of any other compressors handling BASE64, IIRC the paq8px branch handles some BASE64 variant, but not .eml

    EDIT: It was implemented in paq8pxd, this post seems to be the first where it's mentioned. Later, it also found its way in paq8px, see this post.
    http://schnaader.info
    Damn kids. They're all alike.

  3. #3
    Member
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    24
    Thanks
    4
    Thanked 4 Times in 4 Posts
    Thanks for the answer. I will test PAQ and precomp at the end of this month.

    Precomp crashed preprocessing a 1.2 GB TAR-archive of E-Mails and crashed about once in 300 solo-preprocessed E-Mails.

  4. #4
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by Sebastian W View Post
    Precomp crashed preprocessing a 1.2 GB TAR-archive of E-Mails and crashed about once in 300 solo-preprocessed E-Mails.
    Oh, if it crashed, it's different. In your first post, you said corruption which made me think of a file that could not be restored identical. A crash most likely is related to the packJPG issue that was fixed in the development version. In this case, I'd suggest you to try the latest development version on your files. In the attachment to this post, you'll find a Windows binary (please use the first ZIP, not the experimental "_O3" variant).

    It could be good to use the parameter "-t-3" (disable MP3 compression) to not run into one of the MP3 slowdown issues, especially if your E-Mails don't contain any MP3s.
    http://schnaader.info
    Damn kids. They're all alike.

  5. #5
    Member
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    24
    Thanks
    4
    Thanked 4 Times in 4 Posts
    When using precomp on 25.725 eml-files (5.700 MB), I found no crashes and four files with changed checksums and filesizes.

    I removed the E-Mail-headers so I can post the files here.
    I tried version 0.4.4 on all the eml-files and version 0.4.5 only on the files with wrong checksums.
    Do not decode and run the Microsoft Office documents in the mail-attachements. They might do bad things with your computer, like installing "Locky".
    Virusscanners do not like the pcf-files that precomp creates from these mails, and must be disabled temporary, if you want to use precomp on these files.

    I included the original eml-files (headers removed by myself) and the decompressed version of the eml-file in the attached zip-file.

    Version: 0.4.4 + 0.4.5
    Precompression-Parameters: -cn
    Un-Precompression-Parameters: -r
    Original files (headers removed): *.eml
    compressed and uncompressed files: *_.eml
    Precomp-file: not included, virus-scanners do not like it

    When I tried precomp on eml-files the last time, a few years ago, i got some crashes. After removing the responsible files (eml-files with attached pdf-files) I got a corrupted archive after some of the base64-reencoded material hat a bigger filesize than before.
    Attached Files Attached Files

  6. The Following User Says Thank You to Sebastian W For This Useful Post:

    schnaader (26th April 2016)

  7. #6
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Ah, these are recursive Base64 streams (Base64 containing Base64) - the implementation has a bug there and uses the wrong line lengths when restoring the streams. Created issue #36 for this. Thanks for spotting this!
    http://schnaader.info
    Damn kids. They're all alike.

  8. #7
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Latest commit fixes the Base64 recursion bug, all 4 files are now restored correctly. A compiled version is attached.
    Attached Files Attached Files
    http://schnaader.info
    Damn kids. They're all alike.

Similar Threads

  1. Advice for compression of flat text files?
    By jlkwan in forum Data Compression
    Replies: 4
    Last Post: 23rd October 2015, 19:37
  2. HFCB: Huge Files Compression Benchmark
    By Bulat Ziganshin in forum Data Compression
    Replies: 129
    Last Post: 6th January 2015, 15:31
  3. Snappy Compression for large number of small files
    By Selvaraj in forum Data Compression
    Replies: 1
    Last Post: 30th March 2013, 23:43
  4. NLP and compression of TXT files
    By BetaTester in forum Data Compression
    Replies: 0
    Last Post: 13th June 2012, 22:34
  5. recommended formats for game data and partial updates
    By willvarfar in forum Data Compression
    Replies: 14
    Last Post: 23rd November 2010, 19:26

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •