Page 1 of 2 12 LastLast
Results 1 to 30 of 45

Thread: Kraken compressor

  1. #1
    Member
    Join Date
    Apr 2016
    Location
    Can
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Kraken compressor

    Kraken (part of Oodle from Rad Game Tools) (Closed Source)

    Reportedly near LZMA ratios with decompression speeds more comparable to Zstd.

    From http://cbloomrants.blogspot.ca/2016/...se-kraken.html
    Kraken : 4.05 to 1 compression ratio, 919.8 decode mb/s
    zlib9 : 2.74 to 1 compression ratio, 306.9 decode mb/s
    lzma : 4.37 to 1 compression ratio, 78.8 decode mb/s

    (speed measured on a Core i7-3770 3.4 Ghz , x64, single threaded)

  2. #2
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    Kraken is, like LZNA and BitKnit, a set of RadGameTools compressors that cannot be tested by public
    because no demo executables are out there, so we cannot check if those claims are true.

  3. #3
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    While Charles is not one to BS others, I view the announcement as uninteresting advert. Just an imprecise tale of what they did w/out a hint *how*.

  4. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by m^3 View Post
    w/out a hint *how*.
    "Kraken achieves its amazing performance from new ideas on how to do LZ compression, and carefully optimized low level routines."
    http://www.radgametools.com/oodlewhatsnew.htm

  5. #5
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Now that makes everything clear, thanks Sportman.

  6. #6
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    At least this time Charles is showing results for some files (Silesia) that the rest of us have access to. Cool.

    If I had to guess, I'd guess that the "new ideas on how to do LZ compression" are from GLZA. I could easily be wrong, but are there other really new ideas on how to do LZ compression, on a par with GLZA's grammar-based dictionary construction (guided by estimates of entropy-coded string lengths) and selective recency modeling?

    I'm guessing that if the "new ideas" originated at RAD, they'd say so... so if they're not from GLZA, they're likely from something else that's shown up recently, and I'd be very interested to know what.

    GLZA decompresses fast for its ratios, for basic algorithmic reasons, but I'm sure it could be considerably faster with the kind of low-level optimizations RAD is using. (It doesn't use SIMD, or any platform-specific ifdefs---it's just C code written to be pretty fast.) There's room for the kind of improvement LZ77-based algorithms have seen---pretty much everything about GLZA could be experimented with and improved, or varied to hit different points on the Pareto frontier. (As Tree did---it was simpler and significantly faster, with slightly worse ratios.)

  7. #7
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Paul W. View Post
    At least this time Charles is showing results for some files (Silesia) that the rest of us have access to. Cool.

    If I had to guess, I'd guess that the "new ideas on how to do LZ compression" are from GLZA.
    While it is possible the algorithm was (partially) inspired by Tree and some of the small dictionary tests that knocked down Pareto Frontier decompression speeds by 2x - 3x on enwiki's, the compression ratios shown for Kraken make me think it has significant differences. I would guess it uses a dictionary but it's not defined hierarchically. It seems to me that the whole class of (real world) dictionary based (vs. sliding window based) compression algorithms has hardly been explored.

    In my experience, the most significant factor for achieving very fast decompression with good compression ratios when using an LZxx style decompressor is cache misses, ie. having to read data from RAM. If I wanted to create a very fast decompressing variant of LZMA, I would try using a flat dictionary, marking useful strings for entry into the dictionary at their origin instead of using history references. That way the strings are contained in a relatively small dictionary rather than a relatively large history, allowing a signficantly larger portion of the referenced data to be available from the caches. With thoughtful (and possibly machine specific) coding, I would expect results similar to what Kraken has achieved. It will be interesting to see decompression RAM usage on large files. If RAM usage is near/at the Pareto Frontier, that would be consistent what I would try and what I would guess Kraken is.

  8. The Following 2 Users Say Thank You to Kennon Conrad For This Useful Post:

    Paul W. (30th April 2016),SolidComp (26th June 2016)

  9. #8
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    As it has very similar parameters as zstd -22 and lzturbo -39, it's probably LZ+tANS (LZNA and BitKnit use rANS).

  10. #9
    Member
    Join Date
    Dec 2015
    Location
    US
    Posts
    57
    Thanks
    2
    Thanked 112 Times in 36 Posts
    First off: yes, I know that the situation is frustrating for you all, but please do realize that RAD is a for-profit company, that we're in the business of selling codecs, and that there's several man-years worth of full-time work in Oodle. Both Charles and me make an honest effort to give back to the community at least as much as we "take", we blog fairly extensively about our work (both about things we tried that worked well and those that didn't), we give credit to prior work when we're aware of it, and we publish code fairly regularly and royalty-free. On my side, that includes ryg_rans, which includes interleaved ANS, SIMD implementations, and the alias table variants, all of which were, to my knowledge, novel at the time I published it.

    We have no intention of sitting on this indefinitely. But I don't think we need to apologize for not wanting to spill the beans completely on day 1 either.

    One further note: Kraken was mostly designed and implemented by Charles. I contributed some various bits and pieces and helped with getting it out the door.

    Quote Originally Posted by Paul W. View Post
    I'm guessing that if the "new ideas" originated at RAD, they'd say so... so if they're not from GLZA, they're likely from something else that's shown up recently, and I'd be very interested to know what.
    No GLZA - the paper only came out a couple weeks ago while we were well into finalizing the codec and testing it; I'm flattered you think we have that kind of turnaround time to ship a new codec, but no. At the high level, this is plain LZ77+entropy coding, sorry to disappoint.

    Kraken is following up on a bunch of leads we ran into last year while working on LZNA and BitKnit, several things we learned tuning LZNib for the PS4 (1.6GHz AMD Jaguar cores, 2 instrs/cycle max, ~200 cycle memory access latency, ~24 cycle L2 cache latency; relatively challenging target for compression, though heaven compared to older game consoles), and earlier stuff we've been experimenting with since about 2014. It's basically combining the more successful ideas from BitKnit, LZNib (that one Charles has written about extensively) and LZB (LZ-bytewise, essentially a LZ4 derivative, which Charles has been very explicit about from the beginning).

    I'm not sure what you'd expect us to say beyond "this is our new codec"; "this is our new codec, which we designed, containing ideas by us, and being implemented by ourselves"?

    Quote Originally Posted by Kennon Conrad View Post
    In my experience, the most significant factor for achieving very fast decompression with good compression ratios when using an LZxx style decompressor is cache misses, ie. having to read data from RAM.
    Cache misses are indeed one of the bigger cost factors, but there's other significant bottlenecks as well, and regular LZ matches in particular are not terrible. Their redeeming quality is that almost nothing directly depends on them: a LZ match copy just boils down to a bunch of "store(destination, load(source));". The beauty of out-of-order execution is that such code permits aggressive reordering: even when the load misses all cache levels, it (and the dependent store) can just sit around in the load/store queues for a while waiting for the load to complete, and meanwhile the decoder can just chug along and make forward progress (unless you then try to read the data you just stored, anyway!). Now you don't generally have out-of-order windows anywhere near large enough to absorb a full miss all the way to main memory, but if you're careful about it you do get to overlap a good chunk of it with other useful work, so it's not all wasted.

    Many of our older codecs were heavily (and in quite unfortunate ways) constrained by the need to run decently on the Cell PPUs (PS3) and the Xenon CPU (Xbox 360), both of which are in-order designs with clock rates that look high on paper (3.2GHz!) but high instruction latencies (not even integer adds are single-cycle...) and numerous micro-architectural problems. For example, bit shifts with a variable (as opposed to fixed at compile time) shift distance are microcoded and take at least 12 clock cycles; even basic bit IO is a serious problem on these chips. Missing to main memory typically takes over 500 cycles, trying to load a value that was recently stored needs to wait until it's been written to the L2 cache, which typically take 30-50 cycles, and so forth. So several of our older codecs, and in particular LZHLW (which Kraken was meant to replace) were written very much in "damage reduction" mode to try and not be completely terrible on these machines.

    Designing for target machines that are both out-of-order (enabling the kind of overlap I described without needing some crazy combination of prefetches and software pipelining) and not actively hostile to decompression code has been a nice change of pace.

  11. The Following 12 Users Say Thank You to fgiesen For This Useful Post:

    Bulat Ziganshin (29th April 2016),Cyan (30th April 2016),DirtyPunk (29th April 2016),dnd (29th April 2016),encode (2nd May 2016),Intrinsic (3rd May 2016),JamesB (3rd May 2016),Jarek (29th April 2016),jibz (29th April 2016),Kennon Conrad (29th April 2016),Paul W. (29th April 2016),Turtle (30th April 2016)

  12. #10
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Calling "Kraken is a revolutionary new data compression algorithm" is more than an exaggeration.
    High speed decompression is nothing new.
    LzTurbo (mode 29,39), existing since years decodes also several times than zlib with high compression ratio in the lzma domain (see TurboBench).
    Zstd is also constantly improving.
    Although, it is understandable for a profit company, Kraken is a phantom compressor without any public executable, not even a demo for benchmarking.

    I have now included a kraken template (now calling memcpy) into TurboBench. This makes very simple incorporating kraken (or other compressors) into TurboBench.
    Only the "header include" line, the calls to compress/decompress and the object files in the makefile have to be changed.
    The parameter "lev" can also be used. Compression levels 1...9 are predefined.

    All packages (more than 70!) are updated to the latest versions.
    Maybe Charles have a mind to incorporate his compressors into TurboBench permitting comparing accurately the best and latest compressors.
    Last edited by dnd; 29th April 2016 at 15:56. Reason: TurboBench

  13. #11
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    ryg: "No GLZA - the paper only came out a couple weeks ago while we were well into finalizing the codec and testing it; I'm flattered you think we have that kind of turnaround time to ship a new codec, but no."

    I'm so impressed with you and Charles---very, very impressed---that I wouldn't have ruled that out. (But that wasn't really what I was thinking. The main ideas in the GLZA paper were in the epic Tree thread here a year or so ago.)

    "At the high level, this is plain LZ77+entropy coding, sorry to disappoint."

    Not disappointed, but very interested to know that, thanks----and thanks very much for your and Charles's many contributions in code and in clear, thoughtful explanations on your blogs.
    Last edited by Paul W.; 29th April 2016 at 16:18.

  14. #12
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Good to hear that, while everybody benefit from better data compression, a few people in the world are actually paid for working on them with more than priceless satisfaction ... from one side people are angry if somebody doesn't just give away his work for free, from the other they don't appreciate things that are given for free ...
    Charles and Fabian make a great work and I really appreciate that they share their ideas and experience - like the interleaved entropy coders or alias rANS.

    However, these compressor announcements are pure advertisements - without any way for confirming them ... but they suggest possibility and ways for improvements, and their customers are probably allowed to perform objective benchmarks before purchasing.

    ps. It is strange that while they directly say that LZNA and BitKnit use rANS, they only say enigmatic "new ideas on how to do LZ compression" here.
    From their blog posts I had strong impression that Fabian has focused on rANS, while Charles on tANS, which fits the parameters they claim.
    What new ideas they could refer to? Any speculations?
    Last edited by Jarek; 29th April 2016 at 17:25.

  15. #13
    Member
    Join Date
    Dec 2015
    Location
    US
    Posts
    57
    Thanks
    2
    Thanked 112 Times in 36 Posts
    Quote Originally Posted by Paul W. View Post
    But that wasn't really what I was thinking. The main ideas in the GLZA paper were in the epic Tree thread here a year or so ago.
    Haven't checked any of that out yet! I need to have a look soon.

    Quote Originally Posted by Jarek View Post
    Good to hear that, while everybody benefit from better data compression, a few people in the world are actually paid for working on them with more than priceless satisfaction
    If working on new compression techniques was all there was to the job, I'd happily do it for free.

    But we're selling a product. The bulk of our time (and mental capacity) goes into things like customer support, maintenance, testing, and documentation.

    And even on the coding side, it's not what you (probably) think. Most of you probably develop using a single compiler, on a single machine, and running a single OS (or maybe dual-booting Linux and Windows). Oodle needs to compile and get good performance with Visual C++ (various versions, but let's skip over that), gcc 4.1 (because some of our targets forked gcc at that point and never updated), more recent gccs, and Clang. Target OSes include Windows, Linux, MacOS X, iOS, Android, and some FreeBSD derivatives, as well as more exotic custom OSes (game consoles again). Most of you probably only care about reasonably recent x86 processors in 64-bit mode. Well, currently the majority of our x86 licensees run 32-bit code for decompression (games have only in the past year started to require 64-bit). Bad 32-bit performance isn't OK. And if it is x86 64-bit it is currently mainly non-mainstream archs like AMD Jaguar. Of course we also need to support ARM (both 32b and 64b) and PowerPC (also 32b and 64b, and big endian for extra fun) on various wildly different pieces of hardware. You're probably used to modifying a source file, compiling it, and running it within a second. On some of the machines we develop for, launching a new executable takes well over 30 seconds.

    I have personally spent more time last year working on seemingly trivial issues like expressing unaligned loads on various architecture/compiler combinations (see also Yann's posts on the same topic) and dealing with Linux libc symbol versioning problems than I have working on say the entropy coder in BitKnit. The latter took something like 5 days, most of it tuning and various failed experiments. The former took more like 3 weeks of my time total.

    Trying out different models and contexts, or even optimizing a decoder, I can spend all day on (and often do). That's the fun part. Actually debugging it, testing it, documenting it, making sure it's robust, that it's not exploitable with malicious data, that all your SDKs and compilers are the right version so customers can actually use your lib, writing a bunch of examples and documentation, dealing with the memory management peculiarities of lots of different target environments... that's work, and often very repetitive and boring work at that, but it's what makes your code actually useful for people. That's the part we wouldn't be doing if we weren't getting paid to do it. It's also the part your customers are paying for. It's no coincidence that the codecs that go the extra mile of packaging everything in a nice library, testing (and fuzzing) everything and sorting out cross-platform issues are precisely the ones that actually get adopted outside the compression community.

    Quote Originally Posted by Jarek View Post
    However, these compressor announcements are pure advertisements - without any way for confirming them ... but they suggest possibility and ways for improvements, and their customers are probably allowed to perform objective benchmarks before purchasing.
    Yes, they are advertisements/PR. Which is why we post them on things like our sales website and personal blogs, and not here, and start them with sentences like "today we are announcing X". (Our customers get a free eval first. They're certainly not expected to blindly trust our claims.)

    These announcements are not aimed at regulars of Encode's forum. We're not comparing against zlib or LZMA because they're the natural points to compare to on the LTCB Pareto curve (not even close; FWIW, the most relevant open-source codec to compare to is zstdmax, and Charles' recent post does that, among other things). Nor is it an attempt to artificially make us look good (though if you believe so, there's probably not much I can say to change that). We're comparing against these two in particular because those are two codecs that even non-compression-experts know about and use regularly, and because many of the game consoles either ship SDKs with hand-tuned versions of these two codecs or outright include hardware decoders for them. Some of the more savvy non-experts know about LZ4, Snappy, or zstd, but that is already a fraction of the audience. Giving readers a comparison to other libraries they don't know about doesn't tell them anything, and one of the more common lines we get to hear is "well, we use zlib, and you probably can't improve on that by much". A simple bar graph isn't exactly the most sophisticated of visualizations, but it does tend to resolve that particular issue quickly.

    I fully realize that grandiose claims make everyone twitchy and skeptical. They do for us too. We're diligent about trying not to misrepresent our results, but if you don't believe us, that's totally fine; no harm, no foul. We're not doing this for props or brownie points. Oodle is commercial software, with plenty of free open-source alternatives that we regularly mention on our blogs and elsewhere. We're selling to a space (games) where products have a half-life of maybe a year or two, so nobody is licensing our code because they need the backwards-compatibility to read some 10-year old data. If our customers aren't happy with what they're getting for their money at any point in time, they really can just walk away. We have nothing to gain from lying.

    Quote Originally Posted by Jarek View Post
    It is strange that while they directly say that LZNA and BitKnit use rANS, they only say enigmatic "new ideas on how to do LZ compression" here.
    No ANS in this one. It was intended to be LZ+tANS originally but we weren't happy with the speed.

    We're not holding out on any major theoretical breakthroughs or anything like that. And from a certain point of view, Kraken is "just" incremental changes and careful engineering. But these various increments happen to add up to usually about 2x faster decoding than zstdmax at roughly comparable compression rates. At some point the cumulative effect of several incremental, quantitative improvements makes for a qualitative difference.

  16. The Following 7 Users Say Thank You to fgiesen For This Useful Post:

    encode (2nd May 2016),inikep (4th May 2016),Jarek (30th April 2016),Mike (30th April 2016),Paul W. (30th April 2016),schnaader (30th April 2016),Turtle (30th April 2016)

  17. #14
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Advertising or trying to make a product looking good, is fully legitimate for a profit company.
    Compressors like brotli, lzham, zstd have also a company behind them.

    But some benchmarks and citations on Charles blog site are very strange.
    This is a little surprising from a guru in the compression community,
    that is always pretending to propagate fairness:
    - In the whole blog site, you can't find any reference to the word lzturbo.
    - From his blog you can read "Oodle Kraken offers high compression with incredible decode speed, the likes of which has never been seen before"
    This is simply ridiculous, because LzTurbo available since years is also several times faster than zlib.
    - Until recently, Charles was using only privates files for benchmarking, making approximate comparisons impossible, with the excuse that these files
    are copyrighted!
    - I have offered a simple integration into TurboBench, but until today no comments.

    Now as Charles is finally using some public files in his tests, and following
    my mantra "Trust no benchmark that you did not performed yourself", I have redone the benchmarks:


    For enwik7 we have:

    Code:
    Name              ratio      C MB/s          D MB/s          Size
    lzma        :  3.64:1 ,    1.8 enc mbps ,   79.5 dec mbps 2880703 (last column calculated from the ratio)
    lzham       :  3.60:1 ,    1.4 enc mbps ,  196.5 dec mbps 2912711
    zstdmax     :  3.56:1 ,    2.2 enc mbps ,  394.6 dec mbps 2945438
    Oodle Kraken:  3.49:1 ,    1.5 enc mbps ,  789.7 dec mbps 3004516
    zlib9       :  2.38:1 ,   22.2 enc mbps ,  234.3 dec mbps 4405781
    lz4hc       :  2.35:1 ,   27.5 enc mbps , 2059.6 dec mbps 4462025
    TurboBench: CPU i7-2600k at 3,7 GHz (all compressors with latest version)
    Code:
          C Size   ratio      C MB/s     D MB/s   Name            File              (bold = pareto) MB=1.000.0000
         2830345     3.71       0.46     339.20   brotli 11       enwik7
         2842887     3.69       2.09      77.90   lzma 9          enwik7
         2897779     3.62       1.73     238.51   lzham 4         enwik7
         2929082     3.58       2.31     524.20   zstd 25         enwik7
         2936680     3.57       1.61     849.23   lzturbo 39      enwik7
         3004516     3.49       1.5      789.7    Oodle Kraken    enwik7 (i7-3770 3.4 Ghz from cbloomrants.blogspot.com)
         3574272     2.89       1.69    1058.99   lzturbo 29      enwik7
         3862940     2.71      18.01     248.48   zlib 9          enwik7
         4457390     2.35      31.73    1978.38   lz4 9           enwik7 (lz4,9 = lz4hc )
    - Comments:
    - The Charles benchmark is showing that zstd is only 1,69 times faster, but TurboBench 2,11!
    In Yan's benchmarks we have also zstd more than 2 times faster than zlib
    - The ratio for zlib,9 is 2,38 for Charles benchmark against 2,71! in TurboBench

  18. The Following 3 Users Say Thank You to dnd For This Useful Post:

    inikep (4th May 2016),Jarek (1st May 2016),xinix (1st May 2016)

  19. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by dnd View Post
    - In the whole blog site, you can't find any reference to the word lzturbo.
    (...)
    - I have offered a simple integration into TurboBench, but until today no comments.
    I guess the reason is "who cares about LzTurbo"?
    Though, frankly, I see little reason to care about Oodle as well

    Quote Originally Posted by dnd View Post
    - From his blog you can read "Oodle Kraken offers high compression with incredible decode speed, the likes of which has never been seen before"
    This is simply ridiculous, because LzTurbo available since years is also several times faster than zlib.
    - Until recently, Charles was using only privates files for benchmarking, making approximate comparisons impossible, with the excuse that these files
    are copyrighted!

    Now as Charles is finally using some public files in his tests, and following
    my mantra "Trust no benchmark that you did not performed yourself", I have redone the benchmarks:


    For enwik7 we have:

    Code:
    Name              ratio      C MB/s          D MB/s          Size
    lzma        :  3.64:1 ,    1.8 enc mbps ,   79.5 dec mbps 2880703 (last column calculated from the ratio)
    lzham       :  3.60:1 ,    1.4 enc mbps ,  196.5 dec mbps 2912711
    zstdmax     :  3.56:1 ,    2.2 enc mbps ,  394.6 dec mbps 2945438
    Oodle Kraken:  3.49:1 ,    1.5 enc mbps ,  789.7 dec mbps 3004516
    zlib9       :  2.38:1 ,   22.2 enc mbps ,  234.3 dec mbps 4405781
    lz4hc       :  2.35:1 ,   27.5 enc mbps , 2059.6 dec mbps 4462025
    TurboBench: CPU i7-2600k at 3,7 GHz (all compressors with latest version)
    Code:
          C Size   ratio      C MB/s     D MB/s   Name            File              (bold = pareto) MB=1.000.0000
         2830345     3.71       0.46     339.20   brotli 11       enwik7
         2842887     3.69       2.09      77.90   lzma 9          enwik7
         2897779     3.62       1.73     238.51   lzham 4         enwik7
         2929082     3.58       2.31     524.20   zstd 25         enwik7
         2936680     3.57       1.61     849.23   lzturbo 39      enwik7
         3004516     3.49       1.5      789.7    Oodle Kraken    enwik7 (i7-3770 3.4 Ghz from cbloomrants.blogspot.com)
         3574272     2.89       1.69    1058.99   lzturbo 29      enwik7
         3862940     2.71      18.01     248.48   zlib 9          enwik7
         4457390     2.35      31.73    1978.38   lz4 9           enwik7 (lz4,9 = lz4hc )
    - Comments:
    - The Charles benchmark is showing that zstd is only 1,69 times faster, but TurboBench 2,11!
    In Yan's benchmarks we have also zstd more than 2 times faster than zlib
    - The ratio for zlib,9 is 2,38 for Charles benchmark against 2,71! in TurboBench
    Your machine is faster than Charles' by:
    * 33% with zstd
    * 21% with lzham
    * -2% with lzma
    The sizes differ too.
    Comment:
    The speed results are not directly comparable.

  20. #16
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by dnd View Post
    I have offered a simple integration into TurboBench, but until today no comments.
    Looks like finally serious competition for LZturbo...

    Is it possible to update the TurboBench Windows binary?

    https://sites.google.com/site/powturbo/downloads
    turbobench_win64.7z (1656k) powturbo, Feb 21, 2016, 7:12 PM v.1

  21. #17
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by m^2 View Post
    I guess the reason is "who cares about LzTurbo"?
    Though, frankly, I see little reason to care about Oodle as well
    Well, this is the state of the arts

    Your machine is faster than Charles' by:
    * 33% with zstd
    * 21% with lzham
    * -2% with lzma
    The sizes differ too.
    Comment:
    The speed results are not directly comparable.
    No according to Intel Core i7 3770 vs 2600K i7-3770 is only 1.08 faster and I selected a frequence 1.08 faster for i7-2600k (3,7 vs 3,4), so the cpu's are comparable.
    This will make no difference in the ranking or the relations between zstd and zlib, I have mentioned.
    Please, next time try to be constructive and verify your claims before posting. This is not the first time, you are spitting your poison.

    @Sportman: I will try to build a new version today.

  22. #18
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by dnd View Post
    Well, this is the state of the arts
    It's possible, but verification would be
    1) risky
    2) daunting
    First because there are no sources that can be examined before running the code.
    Second because lack of sources requires a lot more care to make sure there is no cheating.

    I'm aware of some people who decided to run it and nobody reported the tool to be dangerous. I'm not aware about anyone to do a real careful analysis. Since Schelwein is gone nobody looks at closed source codecs really close, at least not publicly.

    Quote Originally Posted by dnd View Post
    No according to Intel Core i7 3770 vs 2600K i7-3770 is only 1.08 faster and I selected a frequence 1.08 faster for i7-2600k (3,7 vs 3,4), so the cpu's are comparable.
    The benchmarks of yourself and Charles show very divergent results. I don't know if the difference comes from:
    * different CPU
    * different memory or any other components
    * different code versions
    * different compilers
    * different benchmarking tools
    or anything else. I do know that on some code your result is faster by a third and on some other it's actually slower. And that means that any comparison would be extremely rough. You may have not noticed that, but the numbers are verified.

  23. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    lzturbo by hamid busidi is well-known as freearc clone, ILLEGALLy incorporating its tornado and other algorithms. if someone need to compare with state-of-the art compressor, he can use freearc directly

  24. #20
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by m^2 View Post
    The benchmarks of yourself and Charles show very divergent results. I don't know if the difference comes from:
    * different CPU
    * different memory or any other components
    * different code versions
    * different compilers
    * different benchmarking tools
    or anything else. I do know that on some code your result is faster by a third and on some other it's actually slower. And that means that any comparison would be extremely rough. You may have not noticed that, but the numbers are verified.
    Well, it is unlikely that the huge difference is coming from the hardware. I want only to point, that there is a strange difference between zstd and zlib in the 2 different benchmarks.
    Maybe someone can check this using both lzbench and turbobench.

  25. #21
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by dnd View Post
    I want only to point, that there is a strange difference between zstd and zlib in the 2 different benchmarks.
    Maybe someone can check this using both lzbench and turbobench.
    Lzbench:

    zlib 1.2.8 -1 77 MB/s 282 MB/s 413505 41.35 enwik6
    zlib 1.2.8 -2 68 MB/s 290 MB/s 398100 39.81 enwik6
    zlib 1.2.8 -3 52 MB/s 296 MB/s 386415 38.64 enwik6
    zlib 1.2.8 -4 49 MB/s 289 MB/s 370826 37.08 enwik6
    zlib 1.2.8 -5 32 MB/s 286 MB/s 360026 36.00 enwik6
    zlib 1.2.8 -6 23 MB/s 290 MB/s 356692 35.67 enwik6
    zlib 1.2.8 -7 20 MB/s 290 MB/s 356071 35.61 enwik6
    zlib 1.2.8 -8 18 MB/s 290 MB/s 355781 35.58 enwik6
    zlib 1.2.8 -9 18 MB/s 290 MB/s 355779 35.58 enwik6

    zstd v0.6.0 -1 214 MB/s 728 MB/s 395441 39.54 enwik6
    zstd v0.6.0 -2 150 MB/s 636 MB/s 368391 36.84 enwik6
    zstd v0.6.0 -3 118 MB/s 604 MB/s 362123 36.21 enwik6
    zstd v0.6.0 -4 92 MB/s 572 MB/s 350704 35.07 enwik6
    zstd v0.6.0 -5 79 MB/s 582 MB/s 343773 34.38 enwik6
    zstd v0.6.0 -6 63 MB/s 609 MB/s 333691 33.37 enwik6
    zstd v0.6.0 -7 49 MB/s 621 MB/s 329070 32.91 enwik6
    zstd v0.6.0 -8 37 MB/s 647 MB/s 323856 32.39 enwik6
    zstd v0.6.0 -9 33 MB/s 648 MB/s 322365 32.24 enwik6
    zstd v0.6.0 -10 25 MB/s 655 MB/s 320634 32.06 enwik6
    zstd v0.6.0 -11 23 MB/s 654 MB/s 320226 32.02 enwik6
    zstd v0.6.0 -12 17 MB/s 659 MB/s 319024 31.90 enwik6
    zstd v0.6.0 -13 17 MB/s 658 MB/s 319024 31.90 enwik6
    zstd v0.6.0 -14 12 MB/s 663 MB/s 318500 31.85 enwik6
    zstd v0.6.0 -15 10 MB/s 664 MB/s 315537 31.55 enwik6
    zstd v0.6.0 -16 10 MB/s 664 MB/s 315537 31.55 enwik6
    zstd v0.6.0 -17 10 MB/s 664 MB/s 315537 31.55 enwik6
    zstd v0.6.0 -18 5.99 MB/s 654 MB/s 304792 30.48 enwik6
    zstd v0.6.0 -19 4.46 MB/s 628 MB/s 301554 30.16 enwik6
    zstd v0.6.0 -20 4.36 MB/s 628 MB/s 301255 30.13 enwik6
    zstd v0.6.0 -21 4.12 MB/s 628 MB/s 301224 30.12 enwik6
    zstd v0.6.0 -22 4.10 MB/s 628 MB/s 301220 30.12 enwik6

    brotli 2016-03-22 -1 142 MB/s 274 MB/s 379270 37.93 enwik6
    brotli 2016-03-22 -2 88 MB/s 303 MB/s 360153 36.02 enwik6
    brotli 2016-03-22 -3 76 MB/s 311 MB/s 357625 35.76 enwik6
    brotli 2016-03-22 -4 46 MB/s 318 MB/s 347293 34.73 enwik6
    brotli 2016-03-22 -5 25 MB/s 336 MB/s 327967 32.80 enwik6
    brotli 2016-03-22 -6 20 MB/s 340 MB/s 322199 32.22 enwik6
    brotli 2016-03-22 -7 14 MB/s 340 MB/s 317674 31.77 enwik6
    brotli 2016-03-22 -8 12 MB/s 340 MB/s 315670 31.57 enwik6
    brotli 2016-03-22 -9 9.25 MB/s 338 MB/s 314327 31.43 enwik6
    brotli 2016-03-22 -10 0.68 MB/s 259 MB/s 285405 28.54 enwik6
    brotli 2016-03-22 -11 0.56 MB/s 283 MB/s 281179 28.12 enwik6


    TurboBench:

    C Size ratio% C MB/s D MB/s Name File
    281201 28.120 0.54 285.26 brotli 11 enwik6
    301259 30.126 4.38 623.98 zstd 20 enwik6
    301558 30.156 4.48 624.48 zstd 19 enwik6
    304796 30.480 6.00 650.46 zstd 18 enwik6
    315541 31.554 10.22 661.95 zstd 16 enwik6
    315541 31.554 10.21 661.82 zstd 17 enwik6
    315541 31.554 10.22 661.30 zstd 15 enwik6
    317825 31.782 9.80 350.93 brotli 9 enwik6
    318439 31.844 12.25 349.24 brotli 8 enwik6
    318504 31.850 12.42 659.24 zstd 14 enwik6
    319028 31.903 16.99 656.47 zstd 12 enwik6
    319028 31.903 16.98 656.42 zstd 13 enwik6
    320230 32.023 23.73 651.98 zstd 11 enwik6
    320455 32.045 15.15 348.64 brotli 7 enwik6
    320638 32.064 25.11 651.72 zstd 10 enwik6
    322369 32.237 33.15 644.99 zstd 9 enwik6
    323860 32.386 37.17 644.96 zstd 8 enwik6
    324486 32.449 20.80 348.24 brotli 6 enwik6
    329074 32.907 49.50 620.93 zstd 7 enwik6
    330384 33.038 25.34 346.27 brotli 5 enwik6
    333695 33.369 63.47 607.37 zstd 6 enwik6
    343777 34.378 79.59 580.38 zstd 5 enwik6
    348036 34.804 48.36 329.19 brotli 4 enwik6
    350708 35.071 92.36 570.94 zstd 4 enwik6
    355783 35.578 19.24 289.81 zlib 9 enwik6
    355785 35.578 19.30 289.70 zlib 8 enwik6
    356075 35.607 21.17 288.86 zlib 7 enwik6
    356696 35.670 24.00 288.95 zlib 6 enwik6
    358428 35.843 76.62 323.26 brotli 3 enwik6
    360030 36.003 34.17 285.42 zlib 5 enwik6
    361036 36.104 91.72 315.64 brotli 2 enwik6
    362127 36.213 118.33 604.43 zstd 3 enwik6
    368395 36.839 150.36 639.26 zstd 2 enwik6
    370830 37.083 52.15 286.74 zlib 4 enwik6
    379274 37.927 145.36 287.32 brotli 1 enwik6
    386419 38.642 56.05 294.27 zlib 3 enwik6
    395445 39.544 216.89 727.99 zstd 1 enwik6
    398104 39.810 74.92 287.89 zlib 2 enwik6
    413509 41.351 85.58 279.65 zlib 1 enwik6
    Last edited by Sportman; 4th May 2016 at 12:54.

  26. #22
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Little trick for zstd command line users :
    If you want to play with highest possible compression ratios, add the command `--ultra`.

    It only effects high compression levels (20+) on large files. In this case, it gets you a few more % compression, albeit at the cost of highly increased memory usage.


    This tip is only for command line users.
    API users (lzbench, turbobench, ..) have different access methods with more fine-tune controls, so it's not necessary for them.
    Last edited by Cyan; 2nd May 2016 at 08:38.

  27. The Following 2 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (4th May 2016),Turtle (3rd May 2016)

  28. #23
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by Stephan Busch View Post
    Kraken is, like LZNA and BitKnit, a set of RadGameTools compressors that cannot be tested by public
    because no demo executables are out there, so we cannot check if those claims are true.
    I've just started to test Kraken myself on the data corpus I put together while building and refining LZHAM. So far, everything I'm seeing indicates it's freaking amazing. I'll be blogging about it very soon.

    I was convinced (and blogged) that Rad was making practical lossless compression breakthroughs last year, given the content of their public blog posts.

  29. The Following User Says Thank You to rgeldreich For This Useful Post:

    kurosu (4th May 2016)

  30. #24
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by fgiesen View Post
    No ANS in this one. It was intended to be LZ+tANS originally but we weren't happy with the speed.
    Interesting. Is it Huffman coding, Polar coding or plain bitwise/bytewise coding?

  31. The Following User Says Thank You to inikep For This Useful Post:

    Jyrki Alakuijala (4th May 2016)

  32. #25
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Shouldn't 1'000'000 bytes of enwik9 be called enwik6?

  33. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    Sportman (4th May 2016)

  34. #26
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Shouldn't 1'000'000 bytes of enwik9 be called enwik6?
    Yes, my mistake, update post.

  35. #27
    Member
    Join Date
    Dec 2015
    Location
    US
    Posts
    57
    Thanks
    2
    Thanked 112 Times in 36 Posts
    enwik7 is supposed to be the first 10^7 bytes (10 decimal megabytes) of enwik8. I think Sportman tested with the first 10^6 bytes instead.

  36. #28
    Member
    Join Date
    Nov 2015
    Location
    France
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts
    <offtopic>
    Quote Originally Posted by rgeldreich View Post
    I've just started to test Kraken myself on the data corpus I put together while building and refining LZHAM
    I have personally chosen lzham as my goto format/archiver/compressor, because I decompress way more often than I compress, and it is to me the only safe solution. I have the source code, I was able to compile myself your 7zip modification and so I don't depend on binaries or you avoiding dangerous encounters with buses. Thanks for making your work public. And also said blog.
    </offtopic>

  37. #29
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Lzbench:

    Compressor name Compress. Decompress. Compr. size Ratio Filename
    zlib 1.2.8 -1 75 MB/s 275 MB/s 4275134 42.75 enwik7
    zlib 1.2.8 -2 66 MB/s 282 MB/s 4119254 41.19 enwik7
    zlib 1.2.8 -3 50 MB/s 288 MB/s 3998722 39.99 enwik7
    zlib 1.2.8 -4 47 MB/s 281 MB/s 3842805 38.43 enwik7
    zlib 1.2.8 -5 31 MB/s 278 MB/s 3730940 37.31 enwik7
    zlib 1.2.8 -6 22 MB/s 281 MB/s 3697381 36.97 enwik7
    zlib 1.2.8 -7 20 MB/s 281 MB/s 3691375 36.91 enwik7
    zlib 1.2.8 -8 17 MB/s 281 MB/s 3688457 36.88 enwik7
    zlib 1.2.8 -9 17 MB/s 281 MB/s 3688446 36.88 enwik7

    zstd v0.6.0 -1 216 MB/s 712 MB/s 4113462 41.13 enwik7
    zstd v0.6.0 -2 147 MB/s 620 MB/s 3801050 38.01 enwik7
    zstd v0.6.0 -3 123 MB/s 560 MB/s 3680520 36.81 enwik7
    zstd v0.6.0 -4 92 MB/s 544 MB/s 3597799 35.98 enwik7
    zstd v0.6.0 -5 80 MB/s 548 MB/s 3519858 35.20 enwik7
    zstd v0.6.0 -6 63 MB/s 570 MB/s 3376233 33.76 enwik7
    zstd v0.6.0 -7 50 MB/s 580 MB/s 3321215 33.21 enwik7
    zstd v0.6.0 -8 36 MB/s 609 MB/s 3249352 32.49 enwik7
    zstd v0.6.0 -9 29 MB/s 614 MB/s 3201166 32.01 enwik7
    zstd v0.6.0 -10 22 MB/s 618 MB/s 3198266 31.98 enwik7
    zstd v0.6.0 -11 18 MB/s 619 MB/s 3153373 31.53 enwik7
    zstd v0.6.0 -12 12 MB/s 624 MB/s 3135613 31.36 enwik7
    zstd v0.6.0 -13 10 MB/s 628 MB/s 3107140 31.07 enwik7
    zstd v0.6.0 -14 6.70 MB/s 634 MB/s 3092102 30.92 enwik7
    zstd v0.6.0 -15 7.61 MB/s 638 MB/s 3064941 30.65 enwik7
    zstd v0.6.0 -16 5.74 MB/s 644 MB/s 3012065 30.12 enwik7
    zstd v0.6.0 -17 4.49 MB/s 650 MB/s 2978936 29.79 enwik7
    zstd v0.6.0 -18 3.24 MB/s 643 MB/s 2846731 28.47 enwik7
    zstd v0.6.0 -19 2.64 MB/s 630 MB/s 2821543 28.22 enwik7
    zstd v0.6.0 -20 2.34 MB/s 632 MB/s 2803831 28.04 enwik7
    zstd v0.6.0 -21 2.32 MB/s 632 MB/s 2802899 28.03 enwik7
    zstd v0.6.0 -22 2.27 MB/s 633 MB/s 2802720 28.03 enwik7
    zstd v0.6.0 -23 2.28 MB/s 633 MB/s 2802720 28.03 enwik7
    zstd v0.6.0 -24 2.27 MB/s 633 MB/s 2802720 28.03 enwik7
    zstd v0.6.0 -25 2.27 MB/s 633 MB/s 2802720 28.03 enwik7

    brotli 2016-03-22 -1 139 MB/s 282 MB/s 3908996 39.09 enwik7
    brotli 2016-03-22 -2 85 MB/s 312 MB/s 3723517 37.24 enwik7
    brotli 2016-03-22 -3 72 MB/s 320 MB/s 3697141 36.97 enwik7
    brotli 2016-03-22 -4 32 MB/s 336 MB/s 3582970 35.83 enwik7
    brotli 2016-03-22 -5 19 MB/s 357 MB/s 3363734 33.64 enwik7
    brotli 2016-03-22 -6 15 MB/s 366 MB/s 3268510 32.69 enwik7
    brotli 2016-03-22 -7 10 MB/s 372 MB/s 3154906 31.55 enwik7
    brotli 2016-03-22 -8 7.53 MB/s 374 MB/s 3100673 31.01 enwik7
    brotli 2016-03-22 -9 5.68 MB/s 375 MB/s 3059016 30.59 enwik7
    brotli 2016-03-22 -10 0.60 MB/s 296 MB/s 2779633 27.80 enwik7
    brotli 2016-03-22 -11 0.51 MB/s 332 MB/s 2738682 27.39 enwik7

    TurboBench:

    C Size ratio% C MB/s D MB/s Name File
    2707078 27.071 0.50 295.48 brotli 11 enwik7
    2803835 28.038 2.39 622.65 zstd 20 enwik7
    2821547 28.215 2.65 620.30 zstd 19 enwik7
    2846735 28.467 3.25 633.28 zstd 18 enwik7
    2978940 29.789 4.53 637.57 zstd 17 enwik7
    3012069 30.121 5.76 633.42 zstd 16 enwik7
    3064945 30.649 7.69 631.19 zstd 15 enwik7
    3092106 30.921 6.71 624.57 zstd 14 enwik7
    3097586 30.976 5.71 383.94 brotli 9 enwik7
    3107144 31.071 10.22 621.74 zstd 13 enwik7
    3133173 31.332 7.60 383.48 brotli 8 enwik7
    3135617 31.356 12.42 618.53 zstd 12 enwik7
    3153377 31.534 18.25 614.86 zstd 11 enwik7
    3189062 31.891 10.41 380.83 brotli 7 enwik7
    3198270 31.983 22.52 616.82 zstd 10 enwik7
    3201170 32.012 29.90 612.38 zstd 9 enwik7
    3249356 32.494 36.37 609.59 zstd 8 enwik7
    3298832 32.988 15.44 377.28 brotli 6 enwik7
    3321219 33.212 50.22 580.85 zstd 7 enwik7
    3376237 33.762 63.89 571.50 zstd 6 enwik7
    3395842 33.958 19.95 369.17 brotli 5 enwik7
    3519862 35.199 80.28 548.60 zstd 5 enwik7
    3597083 35.971 33.83 350.13 brotli 4 enwik7
    3597803 35.978 92.32 543.53 zstd 4 enwik7
    3680524 36.805 123.44 559.96 zstd 3 enwik7
    3688450 36.884 18.37 281.51 zlib 9 enwik7
    3688461 36.885 18.42 281.65 zlib 8 enwik7
    3691379 36.914 20.54 281.38 zlib 7 enwik7
    3697385 36.974 23.20 280.99 zlib 6 enwik7
    3710181 37.102 73.52 336.81 brotli 3 enwik7
    3730944 37.309 32.90 278.05 zlib 5 enwik7
    3736523 37.365 87.36 327.94 brotli 2 enwik7
    3801054 38.011 148.26 620.71 zstd 2 enwik7
    3842809 38.428 50.27 279.52 zlib 4 enwik7
    3909000 39.090 138.99 298.16 brotli 1 enwik7
    3998726 39.987 53.92 287.86 zlib 3 enwik7
    4113466 41.135 216.63 712.87 zstd 1 enwik7
    4119258 41.193 72.26 281.50 zlib 2 enwik7
    4275138 42.751 82.87 274.10 zlib 1 enwik7

  38. #30
    Member
    Join Date
    Nov 2014
    Location
    Earth
    Posts
    38
    Thanks
    0
    Thanked 77 Times in 19 Posts
    From http://cbloom.com/rants.html: "Kraken needs around 256k of memory in addition to the output buffer." If that is for Huffman decoding tables, there must be a lot of Huffman codes. Let's say there are 4 bytes/entry and 1024 entries/table (max codeword len = 10). Then there would be up to 64 codes. Presumably they would be context dependent.

    Or could it be using the memory for something else entirely?

    It might also decode multiple streams at once, similar to the newer Zstd versions.

    Of course, it's very speculative when the code isn't even available to test.


Page 1 of 2 12 LastLast

Similar Threads

  1. Kitty file compressor (Super small compressor)
    By snowcat in forum Data Compression
    Replies: 7
    Last Post: 26th April 2015, 16:46

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •