Results 1 to 4 of 4

Thread: compressing a really small 1k .COM file

  1. #1
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts

    compressing a really small 1k .COM file

    Okay, this is fairly useless, but I'm vaguely curious and find it hilarious, so just showing you my results and wondering what you think.

    P.S. Original file w/ srcs is here.

    820 befi5.paq8px
    853 befi5.paq8o8z
    884 befi1.paq8o8z
    907 befi1.paq8px
    923 befi.com.lzma
    930 befi.gz
    945 befi.lzh
    967 befi.esp
    984 befi-ppmd.7z
    1,016 befi-624s.com
    1,017 befi.zip
    1,023 befi-apack.com
    1,024 befi-original.com
    1,025 befi-lzma.7z
    1,102 befi.com.bz2
    1,123 befi-bz2.7z

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    515
    Thanks
    182
    Thanked 163 Times in 71 Posts
    Nice work, though I'd encourage you trying to further optimize the COM file instead of compressing it - although Befunge is quite complex and "simple LFN support" sounds nice, there's always room for optimizations. From this year's Hugi Size Coding Competition (http://www.frontiernet.net/~fys/hugi/hcompo.htm), I learned that there are some really good people and techniques to optimize assembler code. Unoptimized code from this competition is about 1000 bytes, 512 bytes with some trivial optimizations. At the end, the winner got it down to 120 bytes - and it still creates and displays a particular set of random mazes, although the code looks like a big pile of junk

    By the way, rename the file to "b" to get a slightly better result Of course, this could be improved further by just removing the PAQ header at the beginning of the file (20 bytes including the filename).

    Code:
    813 b.4.paq8px
    EDIT: Had a quick look at your asm code, main things I'd have a look at for further optimizations:

    • The PRNG of course - it's quite large for an assembler PRNG and uses some magic variables that could perhaps be changed to point to some parts of the code that provide good substitutions.
    • dee - I'm not sure what it's used for yet, but it's quite likely that you can generate the string in less than 12 bytes.
    • The "cmp al,.. j(n)e" parts - some of these are quite repetitive and could be redundant.
    • A quite common optimization is using pusha/popa (1 byte commands) instead of multiple push/pop statements if it doesn't hurt that all registers get on the stack.
    Last edited by schnaader; 27th November 2009 at 02:06.
    http://schnaader.info
    Damn kids. They're all alike.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,966
    Thanks
    153
    Thanked 803 Times in 400 Posts
    http://ctxmodel.net/files/mix_test/mix_test_vA1.rar

    09h2x3\o1rc.exe c befi.com befi.ari -> 820 bytes
    09h2x3\o1rc.exe d befi.ari befi.unp

    Also there's a 5-byte header (filesize + rc pad byte),
    so its 815 actually.

    Its from the previous thread about simple CM models
    for executable compression.
    http://encode.dreamhosters.com/showp...3&postcount=14

  4. #4
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by schnaader View Post
    Nice work, though I'd encourage you trying to further optimize the COM file instead of compressing it
    Easier said than done! Original TASM version was 1280, I added and removed some very minor stuff, got it to 1213 when I converted to FASM. From there was the real work. (Heck, I wasn't even sure it was possible.) I wanted to see if I could match aPACK's 1024 result (two clusters of a floppy). Of course, manual tweaking eventually hurt compression (obviously). But I did it! Funny to me is that now aPACK only saves one byte and 7-Zip adds one.

    - although Befunge is quite complex and "simple LFN support" sounds nice,
    Note that this is Befunge93, so there is no language file support, and all I needed was to open the .b93 script from cmdline. Mainly useful for tab completion under Windows or DOSEMU, etc.

    there's always room for optimizations. ... (Hugi rocks!)
    Yes, I know they are super smart, but I was working alone. I even briefly mentioned it on alt.lang.asm, but I didn't get much interest, so oh well.

    By the way, rename the file to "b" to get a slightly better result Of course, this could be improved further by just removing the PAQ header at the beginning of the file (20 bytes including the filename).
    Yeah, halfway through I remembered that the filename itself takes space. Heh. But it feels like cheating to rename. And obviously PAQ's RAM usage isn't ideal for 1k files. But it's the best, I can't ignore it!

    EDIT: Had a quick look at your asm code, main things I'd have a look at for further optimizations:
    [*]The PRNG of course - it's quite large for an assembler PRNG and uses some magic variables that could perhaps be changed to point to some parts of the code that provide good substitutions.
    Yeah, it's overkill when only used for '?', but I blindly figured it was better for random than other methods, so I didn't mess with it much.

    [*]dee - I'm not sure what it's used for yet, but it's quite likely that you can generate the string in less than 12 bytes.
    Maybe, but I didn't think of anything offhand. (12 bytes isn't that much room for extra instructions!)

    [*]The "cmp al,.. j(n)e" parts - some of these are quite repetitive and could be redundant.
    A lot of that couldn't easily be removed without some adjustments for the jmp [di+table1] etc. parts. In particular, the ten "dw lpush" parts for the 0..9 digits is horribly redundant, but I can't remove that without adding other adjustments.

    [*] A quite common optimization is using pusha/popa (1 byte commands) instead of multiple push/pop statements if it doesn't hurt that all registers get on the stack.
    Good idea. Honestly, I almost dislike the idea that it uses 386 anything, esp. since it's 99% 16-bit (which is smaller code, usually, esp. due to db 66h overrides). But it's not easy without some rewriting due to 32-bit stack needed. More annoying is the bugs I don't know how to fix, but oh well. It's just for fun, so I don't care.

Similar Threads

  1. tor-small compile for Ubuntu 32-bit Linux
    By Sportman in forum The Off-Topic Lounge
    Replies: 10
    Last Post: 22nd June 2009, 17:12
  2. Compressing small bits of data
    By fredrp in forum Data Compression
    Replies: 9
    Last Post: 28th April 2009, 22:27
  3. A Small Warning
    By encode in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 30th August 2008, 21:05
  4. a small plea for the command line compression developers
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 14th June 2008, 02:51
  5. A small article on ROLZ (Russian)
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 29th April 2007, 15:18

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •