Results 1 to 20 of 20

Thread: TinyCM - A simple CM compressor

  1. #1
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts

    TinyCM - A simple CM compressor

    I've just recently developed a simple CM compressor which mixes o1,o2,o3,o6 with three two-input SSE mixers. It has mediocre performance (although this is my first try at a full CM, so that's to be expected), and with enough interest I'll release the source code. I've found that it compresses at around 1mb/s and compresses enwik8 to 25,913,605 bytes. I've attached the 32 bit executable.


    note: The most recent version is attached below. It may differ from the description above as the above describes version 0.1.
    Attached Files Attached Files
    Last edited by david_werecat; 13th October 2012 at 19:48.

  2. #2
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Why don't you make TinyCM and TinyLZP together? I think it will be more powerful on speed and compression ratio if you do LZP first and compress match length and literals with CM.

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Better speed but slightly worse compression than pure CM, most likely. I use that technique in paq9a.

  4. #4
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    I may work on merging them later, although for now I'll keep them separate since they're meant to be experiments in a single algorithm.

  5. #5
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    I've cleaned up the source code enough to be able to release it. Several of the included models are not used in the current configuration.
    Attached Files Attached Files

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Some results on enwik9 on a 2 GHz T3200 under 32 bit Vista:
    Code:
    C:\res>timer tinycm 9 enwik9 enwik9.tc9
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    Compressed 463129088 into 221773537.
    Done.
    
    Kernel Time  =    12.682 = 00:00:12.682 =   0%
    User Time    =  1329.877 = 00:22:09.877 =  90%
    Process Time =  1342.560 = 00:22:22.560 =  91%
    Global Time  =  1461.823 = 00:24:21.823 = 100%
    
    C:\res>timer tinycm d enwik9.tc9 enwik9.out
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    Decompressed 221773534 into 463129088.
    Done.
    
    Kernel Time  =    15.880 = 00:00:15.880 =   1%
    User Time    =  1314.667 = 00:21:54.667 =  91%
    Process Time =  1330.548 = 00:22:10.548 =  92%
    Global Time  =  1435.428 = 00:23:55.428 = 100%
    The actual compression was 1,000,000,000 -> 221,773,542 which verified OK.

    The supplied .exe files required MSVCR110.dll which I didn't have so I recompiled with g++ 4.7.1 using: gcc -O3 -march=native -s *.c -I.

    http://mattmahoney.net/dc/text.html#2217

  7. #7
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Thanks for testing it. I'll check into that size bug.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Also I noticed that the "level" option doesn't seem to do anything except be stored in the first byte of the archive. Is it supposed to be 0..9?

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Decompression failed on dickens and reymont in the Silesia corpus. The decompressed file was 1 byte too small, so I didn't include in the benchmark. Other files were OK. http://mattmahoney.net/dc/silesia.html

    I included tinylzp which verified OK.

  10. #10
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Thanks once again for testing it. I've finished with version 0.2 which includes the following fixes and improvements:

    • Added compression levels
    • Added file buffering
    • Fixed errors in compression/decompression
    • Fixed large file sizes


    Also, to fix the MSVCR110.dll error the MSVC 2012 runtime needs to be installed. It can be found in any of these locations:
    Attached Files Attached Files

  11. #11
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    my compile always exits telling me: an error occured while compressing. done.
    The compile was done as described by Matt and the commandline was like: Tinycm 9 3d.tar 3d.tcm

  12. #12
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Option 9 takes too much memory for a 32 bit compilation, use option 8 instead.

  13. #13
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    thx for the sample and the very clean code I which zpaq would be so modularized.

  14. #14
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    "tinycm 9" doesn't work, and out size display is always 0 for my 32bit tinycc build here. (a very limited test with 421,888 bytes tcc.exe)

  15. #15
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    Option 9 takes 2GB of memory, which is too much for a 32-bit compilation. As for the issue with the out size display always printing zero, I tried compiling with tcc 0.9.25 in Windows and it's printing the proper output sizes for me. Which os/version of tcc are you using?

  16. #16
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by david_werecat View Post
    Option 9 takes 2GB of memory, which is too much for a 32-bit compilation. As for the issue with the out size display always printing zero, I tried compiling with tcc 0.9.25 in Windows and it's printing the proper output sizes for me. Which os/version of tcc are you using?
    my own build from git (but it is a bit old)
    http://roy.orz.hm/soft/tinycc-win32.zip

    I may try newer tcc build later.

  17. #17
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    I tried compiling my last uploaded code with your tcc build and it's working fine. I'm not sure what's causing the problem.

  18. #18
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by david_werecat View Post
    I tried compiling my last uploaded code with your tcc build and it's working fine. I'm not sure what's causing the problem.
    Code:
    21:52 F:\tinycc-win32>tinycm
    usage: tinycm [level] [infile] [outfile]       To Compress
    usage: tinycm    d    [infile] [outfile]       To Decompress
    note : level should be between 0 and 9
    
    21:52 F:\tinycc-win32>tinycm 0 tcc.exe tcc.cm0
    Compressed 421888 into 0.
    Done.
    
    21:52 F:\tinycc-win32>tinycm 1 tcc.exe tcc.cm1
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 2 tcc.exe tcc.cm2
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 3 tcc.exe tcc.cm3
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 4 tcc.exe tcc.cm4
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 5 tcc.exe tcc.cm5
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 6 tcc.exe tcc.cm6
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 7 tcc.exe tcc.cm7
    Compressed 421888 into 0.
    Done.
    
    21:53 F:\tinycc-win32>tinycm 8 tcc.exe tcc.cm8
    Compressed 421888 into 0.
    Done.
    
    21:54 F:\tinycc-win32>tinycm 9 tcc.exe tcc.cm9
    An error occured while compressing.
    Done.
    
    21:54 F:\tinycc-win32>tinycm d tcc.cm0 tcc0.exe
    Decompressed 171809 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm1 tcc1.exe
    Decompressed 167340 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm2 tcc2.exe
    Decompressed 162650 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm3 tcc3.exe
    Decompressed 159925 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm4 tcc4.exe
    Decompressed 157427 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm5 tcc5.exe
    Decompressed 155870 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm6 tcc6.exe
    Decompressed 153780 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm7 tcc7.exe
    Decompressed 152819 into 0.
    Done.
    
    21:55 F:\tinycc-win32>tinycm d tcc.cm8 tcc8.exe
    Decompressed 150912 into 0.
    Done.
    
    21:56 F:\tinycc-win32>\app_related\md5sum.exe tcc?.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc0.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc1.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc2.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc3.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc4.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc5.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc6.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc7.exe
    c7aa409b3a23ef085a9015563b553ea3 *tcc8.exe

  19. #19
    Member
    Join Date
    Aug 2011
    Location
    Canada
    Posts
    113
    Thanks
    9
    Thanked 22 Times in 15 Posts
    It almost looks like it's trying to print 32 bit integers instead of 64 bit integers. That would be a problem with msvcrt.dll incorrectly handling the %lld printf specifier. To be sure, I've attached a build using your tcc (tcc -I"%cd%\.." ..\TinyCM.c). It works on my laptop, but could you test this build as well? If it doesn't work, then it's most likely an error with msvcrt.dll.
    Attached Files Attached Files

  20. #20
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by david_werecat View Post
    It almost looks like it's trying to print 32 bit integers instead of 64 bit integers. That would be a problem with msvcrt.dll incorrectly handling the %lld printf specifier. To be sure, I've attached a build using your tcc (tcc -I"%cd%\.." ..\TinyCM.c). It works on my laptop, but could you test this build as well? If it doesn't work, then it's most likely an error with msvcrt.dll.
    Yeah it doesn't work.
    This reminds me to change "lld" to "I64d" for that.

Similar Threads

  1. Simple binary rangecoder demo
    By Shelwien in forum Data Compression
    Replies: 35
    Last Post: 17th June 2019, 16:21
  2. TinyLZP - A very simple LZP compressor
    By david_werecat in forum Data Compression
    Replies: 8
    Last Post: 15th October 2012, 03:05
  3. a very simple transform for english.dic
    By willvarfar in forum Data Compression
    Replies: 8
    Last Post: 1st March 2010, 15:44
  4. Kwc – very simple keyword compressor
    By Sportman in forum Data Compression
    Replies: 10
    Last Post: 20th January 2010, 17:06
  5. Simple encryption (RC4 like)
    By encode in forum Forum Archive
    Replies: 37
    Last Post: 26th January 2008, 04:05

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •