Page 1 of 2 12 LastLast
Results 1 to 30 of 41

Thread: comprox-0.1

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

    Cool comprox-0.1

    Finally I can access encode.ru and googlecode.com after China's National Day.

    I have decided to drop comprox_ba, and rename comprox_sa to comprox. Now comprox is becoming a practical compressor that provides good compression ration and speed. enwik8 should be compressed to less than 29MB in half a minite (on my 1.6GHz x2 CPU), and decompression is done in a few seconds.

    http://comprox.googlecode.com/files/comprox-0.1.tar.gz (source and executables)

  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Good release !

    New release is in testing with M.O.C. benchmark !

    At moment verified problems in compression/decompression with big files (crash ... iso files - compr. mode 0 ). Compression/decompression verify failed !
    multithread activated (2 cores)
    Nice !
    Last edited by Nania Francesco; 9th October 2011 at 11:07.

  3. #3
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Nania Francesco View Post
    New release is in testing with M.O.C. benchmark !

    At moment verified problems in compression/decompression with big files (crash ... iso files - compr. mode 0 ). Compression/decompression verify failed !
    multithread activated (2 cores)
    Nice !
    The bug happens when compressed file is larger than the original file (does not malloc enough space).
    Fixed: http://comprox.googlecode.com/files/...x-0.1.1.tar.gz

  4. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    MOC benchmark completed !
    verify compression/decompression OK!
    very good results ! ... in the next update of MOC !

  5. #5
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    the above download-link gives a "404"for me. Interestingly it works if I navigate via:
    http://code.google.com/p/comprox/ -> downloads -> comprox-0.1.1.tar.gz (which leads to the same link as given above...)
    o_O

    Best regards!

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Compression is much improved. http://mattmahoney.net/dc/text.html#2505

    The above direct link works for me. Too bad old versions and comprox_ba are gone

    I wondered what happened. I read about China's National day in the U.S. I should have figured they would raise the Great Firewall.

  7. #7
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    C::B compile comprox with memory error

    I do not succeed to compile with code:: blocks comprox.c as I must make in order to compile it with GCC? which compiler you have used?

  8. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  10. #10
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Thanks for the benchmark.
    version 0.7.0 should fix the problem. It also supports 2 threads for compressing. The memory usage is reduced to about 5x in total.

    https://comprox.googlecode.com/files...x-0.7.0.tar.gz

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Test results http://mattmahoney.net/dc/text.html#2123
    Nice improvement again. But unfortunately in 32 bit Windows it crashes when compiled with "gcc -O3 *.c -lpthread". This should work because I have pthreads_win32 installed. So I tested in 64 bit Ubuntu Linux (compiled likewise because Makefile didn't link -lpthread) and was able to run up to e700 with 4 GB memory.

  12. #12
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Test results http://mattmahoney.net/dc/text.html#2123
    Nice improvement again. But unfortunately in 32 bit Windows it crashes when compiled with "gcc -O3 *.c -lpthread". This should work because I have pthreads_win32 installed. So I tested in 64 bit Ubuntu Linux (compiled likewise because Makefile didn't link -lpthread) and was able to run up to e700 with 4 GB memory.
    Use "gcc -Wl,--stack,4000000" for mingw32 under Windows. The default stack size is too small.

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    That worked for enwik8. For enwik9 I had to compile with -Wl,--stack,8000000. I have updated results at http://mattmahoney.net/dc/text.html#2123 to show results in both Windows and Linux to compare with previous tests. Compression ratio and speed are both improved. I also tested on the Silesia corpus at http://mattmahoney.net/dc/silesia.html which shows better compression for most files but especially mozilla and samba. I didn't check if you are using a E8E9 filter.

  14. #14
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    can someone please post a compiled version? I cannot compile using my MinGW and GCC - previous version was easier to compile.

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Compiled with g++ 4.6.1 for 32 bit Windows. Command is:

    gcc -O3 *.c -lpthead -Wl,--stack,8000000 -o comprox
    upx comprox.exe

    You will probably need pthreadGC2.dll from http://sources.redhat.com/pthreads-win32/ to run.
    Attached Files Attached Files

  16. #16
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    thank you very much for compiling Matt.

    This version crashes on the mobile & office testsets - no matter which memory setting I try. The other sets compress with e150 instead of e180 of previous version.

  17. #17
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Stephan Busch View Post
    thank you very much for compiling Matt.

    This version crashes on the mobile & office testsets - no matter which memory setting I try. The other sets compress with e150 instead of e180 of previous version.
    Thanks for testing!
    Version 0.8.0 is out, with better compression ratio and the bug should be fixed.

    http://code.google.com/p/comprox

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I tested in Windows and Linux. Nice improvement in compression. http://mattmahoney.net/dc/text.html#2083

    I also attached a 32 bit Windows compile. I used Shelwien's dllmerge so you should not need pthreadGC2.dll to run it. Compiled like this with MinGW 4.6.1:

    gcc -O3 -msse2 -lpthread -Wl,--stack,8000000 *.c
    dllmerge a.exe pthreadGC2.dll comprox.exe
    upx comprox.exe
    Attached Files Attached Files

  19. #19
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I tested in Windows and Linux. Nice improvement in compression. http://mattmahoney.net/dc/text.html#2083

    I also attached a 32 bit Windows compile. I used Shelwien's dllmerge so you should not need pthreadGC2.dll to run it. Compiled like this with MinGW 4.6.1:

    gcc -O3 -msse2 -lpthread -Wl,--stack,8000000 *.c
    dllmerge a.exe pthreadGC2.dll comprox.exe
    upx comprox.exe
    Very good Job!
    For WCC challenge:
    I tested the program and I'm sorry to see that the program crashes again and again. The most stable configuration is "e256" but I report error on this file!
    Attached Files Attached Files
    • File Type: zip 3.zip (1.05 MB, 127 views)

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    On silesia benchmark, comprox e64 (and other block sizes) crashed on webster during compression in both Windows and Linux. All other files were OK.

  21. #21
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    Thanks Rich for the new version. Thank you Matt for compiling it. I will test it asap.

  22. #22
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Nania Francesco View Post
    Very good Job!
    For WCC challenge:
    I tested the program and I'm sorry to see that the program crashes again and again. The most stable configuration is "e256" but I report error on this file!
    Thank you for the sample file, a bugfix version is here: http://comprox.googlecode.com/files/...bugfix1.tar.gz

  23. #23
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I attached a 32 bit Windows compile of Comprox bugfix used in WCC Challenge.
    Attached Files Attached Files

  24. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    comprox 0.8.0-bugfix1 now works on Silesia corpus, but compression is worse than 0.7.0. http://mattmahoney.net/dc/silesia.html
    Numbers were slightly different than 0.8.0 on enwik8/9 so I updated LTCB too, but just for Windows. http://mattmahoney.net/dc/text.html#2083

  25. #25
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Matt Mahoney View Post
    comprox 0.8.0-bugfix1 now works on Silesia corpus, but compression is worse than 0.7.0. http://mattmahoney.net/dc/silesia.html
    Numbers were slightly different than 0.8.0 on enwik8/9 so I updated LTCB too, but just for Windows. http://mattmahoney.net/dc/text.html#2083
    I meet some problems on LZ77 matching...
    Since I use a hash chain solution, I must set *match_minlen* and hash every S[i, i + match_minlen - 1] strings. But I found that for some files, a smaller *minlen* is better, and for some files bigger is better. I set this variable by blocksize in version-0.8.0 (bigger match_minlen for larger block), but it seems not to work on Silesia corpus.
    Is there a way to decide which minlen should be use, or to use dynamic minlen on a hash chain?

    Sorry for my English, hope you can understand what I said...

  26. #26
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    You could try to have one/more seperate hashtable for smaller match_minlen

  27. #27
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Fu Siyuan View Post
    You could try to have one/more seperate hashtable for smaller match_minlen
    In fact there is one (searching the last 256 position for a smaller match). But I cannot afford one more chain, since a chain costs 4N memory. Maybe I will use a hash table instead?

  28. #28
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    hash table is faster (cache sensitive), its size more customizable; too much search cycles which hash chain offers helps little and is not worthy.
    Bulat had strongly recommended it.

  29. #29
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    RichSelian:
    Did you consider MMC ( https://code.google.com/p/mmc/ )? It should be superior in performance to both Hash Chain and Hash Table with big searching windows.

  30. #30
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    RichSelian:
    Did you consider MMC ( https://code.google.com/p/mmc/ )? It should be superior in performance to both Hash Chain and Hash Table with big searching windows.
    MMC shares the same idea of my implementation, but I think mine is better. MMC uses N pass to make chains with the same prefix, while mine uses only 2 pass to make chains whose prefixes have the same hash value. And just like my implementation, it cannot use different minlen in the same time.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •