Page 1 of 5 123 ... LastLast
Results 1 to 30 of 123

Thread: A recruit's compressor and some questions

  1. #1
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile A recruit's compressor and some questions --- csc

    This thread is discussing about my CSC and also LZ77.It also recorded my progress.
    ================================================== ====
    2009. 04.19
    I came into be interested in data compression several months ago, I tried a lot these days and finally work out a simple file compressor.

    The compressor uses LZP and Order-1 (freq[context][c]) RangeCoding,
    I thought LZP is fast to compress, but I have several questions to ask:

    Does all compressor based on LZP works slow when decompress than compress? My compressor's decompression speed is about 15% slower than compression, how to avoid this?

    Secondly, In my program I make the literal and match_len into one alphebat at first. Lately, I modified it, I seprated them and use a order-4 match flag state machine, But why it shows worse compression than before in almost all files?

    Thirdly, Is there any samples about how to implement Order-2 or Order-3 arithmetic coder but not as complicated as PPM?

    I'm waiting for your advices!

    this is my first Thread(Topic?) in english and sorry for my bad english


    ================================================== ========
    This time I made it into bitwise. Literal/Matchlength/Match_flag use separate simple models. It shows worse compression (34.1-->34.8% on enwik on text but obviously better ratio on binary files. Speed of compression also improved slightly while that in decompression improved much. The two procedure seems completely symmetric.
    ================================================== ========
    some modifications 2009.05.06 CountrySide Compressor 2
    ================================================== ========

    CountrySide Compressor 3(why names countryside? it stands for copycatting in Chinese)

    I've done all exams this terms so I have time these days. Last two days, I have totally rewrite the LZ part of my program----- It achieves better speed than my previous but still has long distance to my expectation.

    Now, I have done such things:

    LZ77 with sliding window and can be any window size (not have to be 2^n and have no look_ahead_buffer)
    a fpaq-like range coder with good speed
    match_pos & match_len are coded in buckets which likes that in deflate.
    match finder uses 4 byte hash chain.
    lazy match evaluation -- 2 matchs forward


    Memory usage: 7MB+5*WND_SIZE (I/O buffer= 1MB*2)

    This version is also very inaptitude. There has two serious problems need to be solved. Firstly, the match finder is too slow! even slower than my previous. And could someone have any ideas about how to deal with uncompressable data fast? The match finder always waste a lot of time on such kind of data.

    Second: I think I need to evaluate how many bits per symbol or matches encoded into precisely.


    At last, I will upload what I have done so far (with 8M window size)
    Type csc3 file1 file2 [x=0..9] to run the program, the program will identify if it's a compressed file by csc3 or not.

    ================================================== =

    2009 07 12 better parsing, especially for binary files. Fixed a serious bug of hash function. Now still be the HC MF&Flexible Parsing&Range Coding

    ================================================== =

    2009 07 13 fixed some bugs that slow down the speed. Speed doubles on some files to previous edition. mode 0 .. 9 all became faster and better compression than the initial CSC3

    ================================================== =
    2009 07 19 CSC3 seems no bug serious now -- However, I still don't know why it's doesn't work on wikipedia.tar by SqueezeChart. The performance is good i think, even in practical use. Maybe my next edition, ECS--- Experimental Compressor for Study , combined several things I studied, will work on the file.

    ================================================== ==
    2009 08 03 A better parsing. Significant improvement on most files.
    ================================================== ==
    2009 08 06 Slightly modified parsing. Add a simple .wav 2channel filter. Redesigned the rule of search depth. Now became more practical

    ================================================== ===
    2009 08 11 Some last modifications leads to csc3 final.
    1. Search depth are dynamic. Then compression on bad data (WAV) won't be too slow.
    2. Command-line format redesigned. Only m1..m3 mode now. And dictionary size can be controled by user.
    3. Add 1-byte match rule which leads to big progress on files such as valley.cmb
    4. Fixed a serious bug on lazy parsing which slows down the speed.

    I will post the mature version (I'm satisfied more or less) after hunting for some bugs and the whole source code in a new thread.

    BT+Optimal Parsing is a big project. It will be done on XXX(CSC?) 4.
    ================================================== ===
    2009 09 06 Some modifications based on CSC3 Final --> CSC 3.1
    I've tried much about HashTable. However, it little disappointed me. HT only beneifts on long search cycles compared to HC. So finally I walked back with HashChain. Now it's Hash3+Hash5 instead of hash3+hash4. Then it achived very obvious progress on text, such as enwik8 or chinesetext. With a little change of parsing, performance on other type of data such as binary won't lose much. I'm still very interested in detector&filter now. But only very little spare time since new term begins.
    New version also shows very precise memory usage.

    ================================================== ===
    2009 09 17 Some modifications again. And, more impressive than CSC3 Final now.
    MF are Hash3+Hash6 now. I tried many combinations, this is the best so far.

    Redesigned hash function and memory usage now is 6*WNDSIZE+4 for compression and <WNDSIZE+3 for decomp now.

    A little improved parsing.

    The most improves I made is on today morning, I switched the range coder with p=0.5(direct bits) into BIT-BYTE operation. Only a little faster speed when compress but much faster decompression on some files. Now on enwik9 kernel time of decompression is 29s! Very competitive to Tornado now.

    Always conceiving about filters these days. Actually it isn't completely about filter. I'm thinking about the mulit-coder for different entropys. For example Order1 would be so slow on bad data, we should encode with raw datas.Also, as osman said on binary datas Order-1 context should be ctx>>4. What's more , I also plan to separate the range coder out of LZ77 part. I'm not sure but hope this would benefits memory access optimization of computer organization.

    ================================================== =====
    2009 09 22 It's the near final version. The exe auto detector works now. Some other filter also works. Bad data are stored now so it decompress very fast. Checking for bugs these days.

    ================================================== =====

    2009 09 03 Fixed memory bug. Fixed a bit-operation bug which only occurs at large dictionary size.
    Attached Files Attached Files
    Last edited by Fu Siyuan; 23rd September 2009 at 13:44. Reason: 2009.09.23 CSC3.1 v0925 near final

  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Firstly, "Welcome to the forum!"

    1-There is no any strict rule as "decompression speed can be lower than compression speed". But, this is usually expected. Because, at decompression, next symbol is always unknown. Though "all" LZP implementations are not a good example in here. So, you can speed-up decompression speed

    2-IMO, "merged alphabet" has some historical story. It was mainly used by huffman back-ended coders. My personal test with ROLZ proved that flag is better than merged alphabet. Of course, this depends on actual implementation. Also, keep in mind LZP is a bit sensitive modeling. You can easily get it worser or better by playing with it's parameters (minumum match length, maximum match length, literal/flag/match length modeling etc). If I were you, I would play with minumum match length first.

    3-If you want to use order-2 or up, you can use same modeling way (using previous n-bytes as context). Seems, you got stuck how to store frequency table (or prediction table or whatever you use). For order-2 which could take 256x256x256=16 MiB or more, you can use lookup as in order-1. But, higher than order-2, you should use hash table (easiest way) or suffix trees.
    BIT Archiver homepage: www.osmanturan.com

  3. #3
    Member
    Join Date
    Aug 2008
    Location
    Saint Petersburg, Russia
    Posts
    215
    Thanks
    0
    Thanked 0 Times in 0 Posts
    This is astonishing!
    Code:
    <...>
    Global Time  =    14.094 = 00:00:14.094 = 100%
    <...>
    18.04.2009  23:29        34 119 354 enwik8.csc
    enwik8 into 32,54 MB in fourteen seconds!! That's quite an achievement!

    Great job mate, and welcome aboard!

  4. #4
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Sometimes if compressor and decompressor are doing almost the same amount of work the decompressor is slower because it has to write more to the hard-drive then it has to read (for compressor opposite direction it reads more). Reading is surely much faster then writing.
    That's often the reason Context mixing is faster on compression.

  5. #5
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think disc access is negligible for CPU intensive processing. Once i ran M1 with an incompatible parameter profile - the processing speed was almost the same, but the input was expanded. It's (1) from Osmans list i guess - i directly used that fact to allow parallelity in some model computation when encoding.

    For me it's quite weird to observe that the decompression takes longer than compression for LZ like algos - since the decompressor doesn't need to probe for the match length. Maybe you'd post some coding loops?

    Your compression change when using flags comes from modelling. You use different contexts and code different symbols. And one could emulate the behavior of such flag based and merged alphabet coding via different model updates.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Does all compressor based on LZP works slow when decompress than compress?

    Its common for decompression to be somewhat slower in symmetrical models,
    main reason might be that data is known to compressor and has to be
    computed in runtime in decompressor.

    > My compressor's decompression speed is about 15% slower than compression,
    > how to avoid this?

    Well, it seems that encoding and decoding loops are implemented separately...
    by better programming, I guess?
    Also, there're two divisions in your rc decoding and one in encoding,
    that certainly adds some effect.

    > Secondly, In my program I make the literal and match_len
    > into one alphebat at first.

    It should be better for speed, as you'd only need a single rc call.
    However separate modelling of the flag should provide better results anyway.

    > Lately, I modified it, I seprated them and use a order-4
    > match flag state machine, But why it shows worse
    > compression than before in almost all files?

    Because the implementation is worse?
    There're probably correlations with previous literals and such,
    not only with previous match flags.

    > Thirdly, Is there any samples about how to implement
    > Order-2 or Order-3 arithmetic coder but not as complicated as PPM?

    Well, you can look at something like
    http://ctxmodel.net/files/ST2rc/o2rcA1.rar
    http://shelwien.googlepages.com/o3bfa.rar
    (and all around there)
    but i suspect that it won't be as simple as you want
    (because of some speed optimization in first case, and other reasons in second).
    But basically, o2 model can be used in a fairly straight way
    (if statistics are properly initialized), especially with bitwise
    coding. And o3+ might not be much more complicated either (with a hash),
    but would already require some inheritance at least...
    or a probability mapping, but that's slower.

    Anyway, a good idea would be to explain your algorithm, or at least
    provide the source, when asking such questions.
    But luckily you'd used a rather bad compiler/options, so I can
    post some reference material instead of you:
    http://shelwien.googlepages.com/CSC2.txt

    So, my advices would be
    1. Get a better compiler (preferably IntelC) and learn to use it.
    2. File i/o has a major share in the speed of fast compressors,
    so using stdio functions imported from msvcrt.dll is not very good.
    Also, buffering a lot of input data is generally ok,
    but buffering more that 4k of output data is inefficient,
    as win32api write call blocks until it writes all the data
    (while Read() might just copy from the system cache).
    3. Do something about the damned indexes in your frequency tables...
    there're so many real multiplications by 270 that its scary.
    4. There're too much unnecessary int16 variables... they're
    generally slower than normal ints, so should be better replaced.
    5. I'd avoid that kind of unnecessary "assertions".
    6. Carryless rangecoder with a maxfreq like 16k and imprecise
    range update is not exactly a good idea - there's certainly some
    visible redundancy (and any carryless rc is slower at decoding
    than full rc).
    7. Switching to a bitwise rangecoder might be worth consideration
    here - it'd make encoding and decoding more symmetric, remove
    divisions and a lot of branches, and generally simplify the code.
    I guess pulling the full potential out of alphabetical rangecoder
    gets quite tricky.
    Last edited by Shelwien; 19th April 2009 at 01:01.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  8. #8
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    I am so glad and appreciate that you gave so much advices! I will keep working on it.

    I didn't post the source code because i'm so unskilled in C and I want post it after a better a implementation which at least satisfied myself.

    I am also very thankful to Matt Mahoney that I never thought my program would appear on benchmark list. And it encouraged me so much.

    What's more; I have a new question: I had ever tried a simple bitwise range coder before, but the speed was less than half of the bytewise range coder. Does it normal? I want my compressor be at economic compression ratio and speed so I am considering the alternative between bitwise and bytewise. Balz has a good ratio but I think it's slow.

    Finally, there's problem with the net in my school and I can't get the:
    http://ctxmodel.net/files/ST2rc/o2rcA1.rar
    Could anyone send it into my email: fusiyuanfsy@163.com

  9. #9
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    LZ and the other "transforms" tend to destroy "good" correlations for byte-wise coding. So, the residues are need to be modeling better. That's why LZP+strong CM implementations are worser than expected. So, I would advice you to switch bit-wise as Shelwien suggested. Keep in mind all things depends on your implementation. In bit-wise, you should eliminate complexity as you can. And a good compiler can speed up your code even ~30% if you are lucky
    BIT Archiver homepage: www.osmanturan.com

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Be sure to use linear counters for probability estimation which don't require a division. For a basic implementation have a look at fpaq0 and its derivates, especially fpaq0p.

    http://encode.ru/forum/showthread.ph...hlight=counter
    http://encode.ru/forum/showthread.ph...hlight=counter

    I think there was a thread about counters in the old forum, too. You may want to search for it.

    http://www.cs.fit.edu/~mmahoney/comp...text.html#5586
    http://www.encode.ru/downloads/fpaq0p.zip

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I didn't post the source code because i'm so unskilled in
    > C and I want post it after a better a implementation which
    > at least satisfied myself.

    I think its better to post whatever you have, at least if you
    want advices. Actually its kinda a common excuse...
    most of the developers here say the same, and never publish the
    sources in the end.
    And I think there's nothing to be ashamed of, as its really
    hard to manually write a program so that it would be uglier
    than what compiler generates from it.

    > What's more; I have a new question: I had ever tried a
    > simple bitwise range coder before, but the speed was less
    > than half of the bytewise range coder. Does it normal?

    Well, it all depends on implementation.
    Technically a bytewise rangecoder is more advanced,
    and bitwise rc requires at least 8 multiplications
    per byte, but bitwise rc is simpler so there's
    not much of a difference, and it allows to perform
    8 counter updates per byte vs 1 in bytewise
    coder, which is really helpful for nonstationary data
    compression. So, generally bytewise compressors have
    slightly better ratio on text, but much worse on
    binaries, comparing to bitwise CMs and such.

    > I want my compressor be at economic compression ratio and
    > speed so I am considering the alternative between bitwise
    > and bytewise.

    Sure. For example, ppmd uses both.

    > Could anyone send it into my email: fusiyuanfsy@163.com

    Sent.
    Also there was my script for downloading binary files
    as text somewhere, I can find it if it would help.

  12. #12
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    2. File i/o has a major share in the speed of fast compressors,
    so using stdio functions imported from msvcrt.dll is not very good.
    Also, buffering a lot of input data is generally ok,
    but buffering more that 4k of output data is inefficient,
    as win32api write call blocks until it writes all the data
    (while Read() might just copy from the system cache).
    Hello

    What do you mean by "not using stdio functions" ?
    fwrite/fread/fopen/fclose should not be used ?

    But what else then ?

  13. #13
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Cyan View Post
    Hello

    What do you mean by "not using stdio functions" ?
    fwrite/fread/fopen/fclose should not be used ?

    But what else then ?
    All stdio functions have some overheads. You can speed up your compressor by switching OS specific APIs (such as CreateFile, ReadFile, WriteFile etc). Because, you can organize I/O entries as you need in your compressor. Also, keep in mind reading/writing single byte at time is not a really good idea.
    BIT Archiver homepage: www.osmanturan.com

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > What do you mean by "not using stdio functions" ?

    In this case I just meant that using fgetc directly
    imported from msvcrt.dll is not the best idea possible.
    And implied suggestion was to use static RTL and
    getc/putc macros instead, at least.

    > fwrite/fread/fopen/fclose should not be used ?
    > But what else then ?

    Well, its kinda obvious, as there're API functions
    for that in Windows, which are used by RTL anyway,
    so why not avoid the extra code.
    However, for a test, I implemented some i/o interface
    using both win32 and stdio APIs, and there was no
    stable speed gain... but then win32 wasn't worse
    either (obviously), and sizes of input and output
    buffers were already optimized there anyway.

    Also, there're more advanced i/o APIs - like
    memory-mapped files and "overlapped" i/o.
    Unfortunately, they're complex and not exactly
    portable, and have various restrictions.
    But if properly used, both of these should provide
    better speed than stdio.

    And then, imho the best approach would be to
    implement reading and writing loops as separate
    threads, to provide a custom non-blocking i/o API.
    Its kinda what "overlapped i/o" does in win32,
    but I'm not aware of a portable equivalent... while
    there's at least the pthreads library for threads.

    Main points are:
    1. To avoid unnecessary memory copying.
    (With fread() the requested data certainly gets
    copied into program's buffer from where it was
    initially read to)
    2. To avoid waiting until i/o call finishes
    (Its usually not quite visible with fread(),
    as it might just copy the data from system cache,
    which is reasonably fast. But fwrite() with
    large amount of data wouldn't return control to app
    until its all written.)

  15. #15
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    ...But fwrite() with
    large amount of data wouldn't return control to app
    until its all written.)
    Where is the Windows's delayed-write functionality? It sometimes causes to loss data (damn! I've lost too much important data!!!) and it's kinda proof that Windows uses cache both reading and writing.
    BIT Archiver homepage: www.osmanturan.com

  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    >> until its all written.

    Well, I didn't really mean that it writes a cluster, and updates
    the references and then returns to check whether you have
    more data. It just seems like fwrite() blocks more than fread()...
    probably at least it copies the data to system buffer and issues
    some message to process it, which has to be handled by some
    other thread, or something. Anyway, in a simple loop with
    fread-process-fwrite it seems that fwrite() wastes much more
    time... probably just because the system can cache more data
    for fread while app works, and it can't do anything similar for writing.

    Also, considering delayed-write features... it seems that there's
    an explicit system copy routine, which is used by Explorer and
    some other file managers, and is significantly faster than any
    simple copy utility using fread/fwrite.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    In the end, I decided to test it again, and had written a simple
    utility that copies the file using specified buffer size, and
    calculates its crc32 to add some load.

    http://ctxmodel.net/files/bcopy_v0.rar

    And here're the results (detailed logs in archive).

    http://ctxmodel.net/files/hddtest.png (with enwik
    http://ctxmodel.net/files/ramtest.png (with enwik9)

    But basically, conclusions are the same as with fpaq0pv4B -
    for HDD i/o its best to have input buffer size in [1M,2M] range,
    and output buffer size = 4k (FS cluster size probably).
    I also did a ramdrive benchmark here though, and there
    something like 256k seems to be optimal both for fread and fwrite.

    Now, here's a part of HDD percentage log
    (buf size, buf alloc, fopen, fread, fwrite, fclose)
    Code:
          4096: 0.000, 15.792, 72.749, 2.628, 4.785,
         69632: 0.000, 16.755, 70.153, 5.483, 3.197,
        135168: 0.000, 15.973, 52.014, 27.973, 0.001,
        200704: 0.000, 13.963, 41.660, 40.140, 0.863,
        266240: 0.000, 14.152, 40.293, 42.161, 0.001,
        331776: 0.000, 12.538, 25.136, 53.824, 5.521,
        397312: 0.000, 11.983, 13.601, 69.231, 2.395,
        462848: 0.000, 12.412, 21.028, 61.984, 1.612,
        528384: 0.000, 11.787, 10.881, 74.614, 0.001,
        593920: 0.000, 11.864, 12.896, 69.162, 3.307,
        659456: 0.000, 12.415, 14.582, 69.379, 0.783,
        724992: 0.000, 12.803, 9.367, 74.826, 0.001,
        790528: 0.000, 12.727, 7.481, 76.894, 0.001,
        856064: 0.000, 12.152, 12.730, 72.303, 0.001,
        921600: 0.000, 13.148, 10.077, 73.769, 0.001,
        987136: 0.000, 13.056, 11.030, 72.913, 0.001,
       1052672: 0.000, 12.369, 11.252, 73.519, 0.001,
    And we can see that percents don't add up to 100%,
    which means that crc computations don't overlap with
    any i/o.
    Also fopen seems to take quite some time, probably
    because of the caching it does.

    I think this confirms that too small or too large buffers
    can stall the i/o, and also that a proper i/o implementation
    can be still faster than simple fread/fwrite loop, because
    there's no way that i/o can't be overlapped with processing
    on a multicore cpu (like Q6600 where this benchmark was performed).
    Last edited by Shelwien; 22nd April 2009 at 05:51.

  18. #18
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    It seems your CSC2 compressor returns an errorlevel=1 on even on successful compression.
    I tried to integrate CSC2 in our FreeArc ARC.INI, but I always get errorlevel=1.

    Code:
    G:\test\cs2>arc a -mcsc test2 1
    FreeArc 0.50 alpha (Apr 19 2009) started: 0.00 secs
    Compressing 1 file of 7.990.959 bytes: 0.01 secs
      Using csc21
      Memory for compression 40mb, decompression 40mb
    Compressing 7.990.959 bytes with csc21 $$arcdatafile$$.tmp $$arcpackedfile$$.tmp
     CountrySide Compression  v2009.04.19  by ForTheKing
     Algorithm:36M buffer LZP and Order-1 context arithmetic coding
     Input csc2 file1 file2 OR
    
    Input filename to compress or decompress: $$arcdatafile$$.tmp
    Operation:Compress
    Input filename to write out: $$arcpackedfile$$.tmp
     Compressing 7.62MB to 1.52MB.
    Total out 1.52MB!
    Operation complete,time: 0.65 s
    Errorlevel=1
    
      Solid block compression results (0.679 seconds)
        csc21: 7.990.960 bytes in 0.679 seconds
    
      Writing directory: 0.76 secs
      Found 1 directory names: 0.77 secs
      Directory written: 0.77
    Compressed 1 file, 7.990.959 => 7.990.960 bytes. Ratio 100.0%
    Compression time: cpu 0.06 secs, real 0.78 secs. Speed 10.284 kB/s
    In my ARC.INI I have :
    Code:
    [External compressor:csc21]
    cmem = 40
    dmem = 40
    packcmd   = {compressor} $$arcdatafile$$.tmp $$arcpackedfile$$.tmp
    unpackcmd = {compressor} $$arcpackedfile$$.tmp  $$arcdatafile$$.tmp

  19. #19
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by pat357 View Post
    It seems your CSC2 compressor returns an errorlevel=1 on even on successful compression.
    I tried to integrate CSC2 in our FreeArc ARC.INI, but I always get errorlevel=1.

    [/SIZE][/CODE]
    Oh...It seems to be my failure. I regard EXITSUCCESS as 1..

    This time I upload the source code , there may be serious flaws in codes and please tell me since i am so unskilled programmer.

    What's more, I wonder how you get the decompiled codes? What compiler should I use? VisualC 2008 Express or DevCpp won't be good enough,or which compile option should I tweak?
    Attached Files Attached Files

  20. #20
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    This time I upload the source code , there may be serious flaws in codes and please tell me since i am so unskilled programmer.
    Thank you for sharing your code!
    I think potential flaws are in anyones sources (including OpenOffice.org, FireFox, etc.), but chances are bigger to eliminate them as a community. Especially when the first code brings as interesting results as yours
    Just for the record: under which license is your source-code opened? Is some license.txt included in your package?

    Best regards!

  21. #21
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks!

  22. #22
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts

    Thumbs up

    In the end, I decided to test it again, and had written a simple
    utility that copies the file using specified buffer size, and
    calculates its crc32 to add some load.

    http://ctxmodel.net/files/bcopy_v0.rar
    Quite incredible & detailed study you made in such a short time, Shelwien.

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Code:
    enwik8 
     enc=8.141s dec=12.625s // CSC2.exe from initial post
     enc=5.046s dec=8.453s // CSC2.exe from source archive
     enc=4.875s dec=7.625s // compiled with GCC 4.3.3
     enc=4.734s dec=7.047s // compiled with GCC 4.3.3 -fprofile-use
     enc=4.500s dec=7.172s // compiled with IntelC 11.0.074
     enc=4.468s dec=6.859s // compiled with IntelC 11.0.074 /Qprof_use
    Attached Files Attached Files

  24. #24
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Intel C seems no free..

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > This time I upload the source code, there may be serious
    > flaws in codes and please tell me since i am so unskilled
    > programmer.

    People are lazy (me too), so I doubt that anybody would really
    read you code, even if its all clean and commented.
    Still, here're some more ideas for you:
    1. Merge the modules with #include instead of object linking.
    Its rather hard these days to write a program which would be
    really slow to build it completely each time, and its rather
    hard to think of another reason to use multiple .obj files.
    Also, compilers generally build better code from a single source.
    2. There's not much sense to keep your code in plain C these days -
    its hard to find a non-C++ compiler actually. And using some basic
    C++ features, like wrapping the rc into a class, would be much
    more compiler-friendly than your current code. Another C++ feature
    which is very good for speed optimizations is "templates".
    It allows to create in compile-time a few instances of you functions,
    optimized for specific cases.
    3. A random rc hint: you can move the range/=totfreq to another place
    from rcEncode (and maybe from GetFreq too) - basically, to the place
    where this totfreq becomes known. Anyway, the point is that its bad
    to use division result right away on modern cpus - if you'd do the
    division, then something unrelated, and then use the division result,
    then actual delay from division might be much smaller.

    > What's more, I wonder how you get the decompiled codes?

    http://hex-rays.com/decompiler.shtml

    > What compiler should I use? VisualC 2008 Express or DevCpp
    > won't be good enough,

    I don't have any experience with DevCpp, but I don't think that
    it can show any good results anyway.
    And as to VC2008, its surely greatly improved since VC6, but still
    is mostly usable when you want a small executable, not fast.
    So my personal opinion is that there's no real optimizing compiler
    currently other than IntelC (with all its bugs and quirks) - its
    the only one that properly does automatic vectorization, anyway.
    But to be fair, I have to add that gcc recently inherited some
    useful features from IntelC (like PGO), and so gcc 4.3+ at least
    shows the results of the same order, even if 10-20% slower on average.

    > or which compile option should I tweak?

    All I guess? There's a lot, and usually they have unpredictable effects
    with specific programs, unrelated to their documented meaning.
    And there's also some source-related stuff, like alignments and
    "restrict" keyword for pointers.

  26. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    IntelC has free evaluation.
    And after one evaluation period ends, you can usually get another
    evaluation license.
    That's not counting other possible ways.

  27. #27
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile

    In fact what i aims at is to work out a practical compressor, I choose a Lzp because I think it is fast. But I found the decompression speed is so unsatisfied after then. And its compress ratio is also hardly satisfying(I have so little experience on context modeling.. I don't like it..) So I'm considering to write a new program.

    In my opinion, a compressor like this should not use too complicated context model since compression/decompression of CM are symmetrical. Then to get a better ratio there should be a better LZ encoder. ROLZ maybe a choice and it achieve a good ratio, however its match finding procedure seems very rough --- it's very slow. What I am thinking about these days is how to design a sophisticated lz77 encoder and encode the literal/match wisely through a bitwise coder. Am I on the right way? It seems to be what the LZMA is doing...I still want to work out one by myself.

  28. #28
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    In practice, ROLZ is speed bounded like LZP. In terminology, actually LZP=ROLZ with single offset. If you don't want to deal with match finding method (hash tables, trees or whatever), you can use ROLZ with order-2 match context (without obfuscation of the context itself). By using this, you will get minumum "2-bytes length matches" initially. So, you can extended matches for free, if match context is seen again. You can have look QUAD/BALZ series. All of them uses order-2 match context in their ROLZ implementation (they are actually kind of hash table). If you want to something faster, switch pure LZ77. So, you will get rid of additional ROLZ context table which pollutes the cache.
    BIT Archiver homepage: www.osmanturan.com

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I choose a Lzp because I think it is fast.
    > But I found the decompression speed is so unsatisfied after then.
    > And its compress ratio is also hardly satisfying

    Why? Its decoding speed was 14MB/s on my machine, and imho it can
    get twice faster if properly written.
    Also you don't have any match sequence optimization, and any
    compression improvement would also improve the speed.
    Also you should do some profiling and see what's the share of
    various parts in decoding time. RC is quite a brake there probably,
    and there're lots of possible optimizations.

    > (I have so little experience on context modeling..
    > I don't like it..)
    > So I'm considering to write a new program.

    Well, I'd suggest writing a few simple order0-2 CM coders.

    > In my opinion, a compressor like this should not use too
    > complicated context model since compression/decompression
    > of CM are symmetrical.

    CM compressors are symmetrical only because they're usually
    much simpler than LZ compressors. Also, nobody said that
    compressors always have to process data sequentially...
    for example, you can encode occurence offsets for each
    word in the dictionary, and then fill the gaps with
    simple literal model.
    Anyway, you're not the first in this area, so its weird
    to hope for record-breaking results without some patience.

    > What I am thinking about these days is how to design a
    > sophisticated lz77 encoder

    Then why not design a sophisticated LZP encoder first?
    With proper optimization and statistical models?
    Btw, you can estimate the redundancy of your data by
    dumping your symbols without compression (they can
    be packed into bytes if the data is text) and compressing
    them with some of available coders.

    > Am I on the right way?

    So, what are the stats of a compressor you're aiming for?
    Ratio (eg. for enwik, coding/decoding speed?

    > It seems to be what the LZMA is doing...
    > I still want to work out one by myself.

    Hacking up a simple LZMA codec might be a good experience, btw.
    "By myself" won't be late after that too

    Also there're actually things that LZMA doesn't have.
    I mean, special handling of various data types,
    like LIPT for text, disasm for code, reordering for tables etc.
    But its not any simple, I guess

  30. #30
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    About Speed of RangeCoder

    These days I worked out a bitwise order-1 range coder which is very similar to that in Balz but a rewrited implement. I think the model is simple enough. But the speed is only about 6M/s on my Athlonx2 4000+. (Enwik8 comp 16.5s decomp 18s). I have made every compile switchs into max speed. The speeds seems normal compared with fcm1/runcoder1 (Large Text Compression Benchmark),however I want to know why the coder in LZMA decompresses so fast? I used 7-zip to get a 100M random data compressed file. It decompression speed is nearly 9M/s. Can I arrive this speed? Because I think decompression should be faster than compression for most users.

Page 1 of 5 123 ... LastLast

Similar Threads

  1. New guy look for helps on naive questions
    By yacc in forum Data Compression
    Replies: 4
    Last Post: 1st January 2010, 18:39
  2. Bunch of stupid questions
    By chornobyl in forum Data Compression
    Replies: 28
    Last Post: 6th December 2008, 18:26
  3. Data Distribution Questions.
    By Tribune in forum Data Compression
    Replies: 13
    Last Post: 25th June 2008, 18:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •