Page 1 of 2 12 LastLast
Results 1 to 30 of 34

Thread: gzip on a chip

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts

    gzip on a chip

    Hi all,

    This work by Abdelfattah, Hagiescu, and Singh is interesting: http://www.eecg.toronto.edu/~mohamed/iwocl14.pdf

    They're 8x faster than Intel's optimized, vectorized CPU-based gzip, and achieve a slightly better compression ratio. The paper has lots of nice detail.

    I don't know why people still mess around with – or introduce – slow, CPU-based codecs. At this point in history, compression codecs should be GPU-accelerated at a minimum, or FPGA and ASIC for widely used compression formats, like whatever we use on the web in a given decade. Same with image formats – there's no point in CPU-only formats. All new image formats should be designed for GPUs (e.g. OpenCL, a good cross-platform format). They'll be faster, better compressed, and save energy.

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    154
    Thanks
    20
    Thanked 66 Times in 37 Posts
    Depends on the codec, ASIC's are great at computation but any codec with memory latency limits such as BWT, PPM, CM, and LZ match finding will not translate into a fast ASIC codec, that's actually the reason some cryptocurrencies become ASIC resistant, it's because using lots of memory (64MB) requires lower latency memory, and it doesn't get much faster than DDR4 or GDDR5. Most codecs these days only support block based parallelism, a CPU has access to fast memory and can handle these blocks within reasonable memory requirements and throughput. For a GPU variant a codec must support parallelism on at least 100 threads and not require 100 times more memory, this is where fine-granular parallelism is the only solution, and so far the only algorithm which supports fine-granular parallelism for an encoder and decoder is BWT. I do agree that GPGPU computing would be nice to have for a compression codec but it would only work in certain situations, but since it doesn't most of the time that's why we still have CPU codecs.

  3. #3
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    ASIC-resistance in cryptocurrencies is collapsing. Apparently, the software developers designing the currencies, coins, tokens, etc. underestimated how much flexibility hardware engineers have. For example, they don't need to have a bus like DDR4. They can put the memory right next to the CPU: https://blog.sia.tech/the-state-of-c...g-538004a37f9b

  4. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by SolidComp View Post
    They're 8x faster than Intel's optimized, vectorized CPU-based gzip, and achieve a slightly better compression ratio. The paper has lots of nice detail.
    They compress to 2x density whereas normal gzip compresses to 3x density.

    They compare with intel gzip, which is not as good as fast brotli or zstd modes.

    One can use LZ4, Snappy, LZO, zstd "minus-levels" to get similar speeds/densities on a cpu without additional hardware.

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Phoronix posted some extraordinary results for zstd on Clear Linux: https://www.phoronix.com/scan.php?pa...ndows-15&num=8
    I wonder if it's legit (having the same output on every OS) and if it is then how they achieved such speed.

  6. #6
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Exclamation

    The ubuntu image "ubuntu-16.04-server-i386.img" is not representative for a compression benchmark, because it is incompressible.

    A quick benchmark with TurboBench : Compressor Benchmark on a skylake i6700 at 3.4 GHz

    Code:
          C Size  ratio%     C MB/s     D MB/s   Name       (bold = pareto) MB=1.000.0000
       660938763    97.4     491.70    8484.71   lzturbo 31      
       661599208    97.5       6.34    8735.77   zstd 19         
       662386281    97.6    2040.23    8504.92   lzturbo 30      
       662715490    97.7      52.96    1770.99   brotli 9
       678428676   100.0   14125.98   13916.77   memcpy
    Last edited by dnd; 31st May 2018 at 22:53.

  7. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    They compress to 2x density whereas normal gzip compresses to 3x density.
    Well, 2.17x is what they reported, not 2x, and they said it's a geometric mean of the compression ratio, so I'm not sure how to interpret it. Since Intel CPU-based solution gets a 2.18x ratio, I don't think they're far off from mainstream gzip. They do say:

    Note that our goal was to create a reference design with high throughput; compression ratio can be further improved by implementing smarter hashing functions for dictionary lookup/update, or by improving the match selection heuristic.
    Quote Originally Posted by Jyrki Alakuijala View Post
    They compare with intel gzip, which is not as good as fast brotli or zstd modes.


    So what? They were working with gzip. That was their choice. Presumably people could do similar things with brotli and Zstd, but those codecs are so new I don't think we should expect FPGA implementations of them published in 2014, which is year this paper was published.


    Quote Originally Posted by Jyrki Alakuijala View Post
    Quote Originally Posted by Jyrki Alakuijala View Post
    One can use LZ4, Snappy, LZO, zstd "minus-levels" to get similar speeds/densities on a cpu without additional hardware.
    Are you sure? They report 22.7 gbps for encoding speed. That's much faster than Snappy or any reported speed I've seen for the others.

    And of course the metric they stress the most is performance per watt. That's where they say they're 12x better than the Intel-optimized CPU implementation. Brotli and Zstd aren't particularly good at performance per watt, and those projects generally don't report that metric at all, which is odd since it matters on mobile
    (brotli hardly reports any data at all, especially for the newer releases).

    You already know that I think that brotli underperforms for its era and context, and you know that if you had a re-do you could produce a better-than-brotli compression codec without a huge effort. I'm still very willing to work on the dictionary. There are large chunks of the head that we need covered, especially newer stuff like schema.org, all the pre-x fetches, CSP-related, etc. Ideally, if you could introduce an actual successor to brotli, it would be very, very helpful to enforce a standardized minified format for HTML, CSS, JS, and SVG. If people are already agreeing to run some new compressor on their server, like brotli, they should be willing to run a minifier that satisfies the standard (or the minifier would just be a preprocessing phase handled by the compressor). If you look into it, I'm sure you'll see that the best outcome on the wire comes from standardized minification combined with a good compressor, especially a compressor that is optimized for the minified input (call it a context).

  8. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by SolidComp View Post
    Well, 2.17x is what they reported, not 2x, and they said it's a geometric mean of the compression ratio, so I'm not sure how to interpret it. Since Intel CPU-based solution gets a 2.18x ratio, I don't think they're far off from mainstream gzip.
    Canterbury corpus: mainstream gzip at quality 9 is 3.371x compression. If one would run it with gzip quality 1 (which no one I know uses because it is just too bad for any purpose), it would still be 2.913. Intel's implementation of gzip in this manner is gzip with a negative quality of -7 or so, and shouldn't be used as a reference in compression density. Just because it has Intel in the name doesn't mean that it is a good or mainstream compression density. I suspect that no one uses the intel's implementation of gzip and it is just an academic exercise.

    Quote Originally Posted by SolidComp View Post
    Are you sure? They report 22.7 gbps for encoding speed. That's much faster than Snappy or any reported speed I've seen for the others.
    22.7 gbps is 2.8 GB/s. Speeds around 1-2 GB/s can be achieved by software. (possibly not by snappy)

    Quote Originally Posted by SolidComp View Post
    And of course the metric they stress the most is performance per watt. That's where they say they're 12x better than the Intel-optimized CPU implementation. Brotli and Zstd aren't particularly good at performance per watt, and those projects generally don't report that metric at all, which is odd since it matters on mobile

    The cpu time spend in Brotli on mobile is typically around 1 ms. For the 1 ms of cpu time it saves 50-500 ms of radio on time. In comparison to gzip, brotli saves battery on a mobile, whereas this kind of low density compression FPGA would increase the radio on time by hundreds of ms to seconds.

    As a rule of thumb, radio uses 7x more energy than a cpu when data is being transmitted.

    Perhaps there are economical uses for such fpga, but it is not on mobile.

    Quote Originally Posted by SolidComp View Post
    You already know that I think that brotli underperforms for its era and context
    Only recently people have started to benchmark brotli against other algorithms using the same window size. When benchmarks are run using the same window size across the compressors, on normal test corpora and on normal work loads brotli is clearly on the pareto-optimal curve.

    Quote Originally Posted by SolidComp View Post
    and you know that if you had a re-do you could produce a better-than-brotli compression codec without a huge effort.
    https://tools.ietf.org/html/draft-va...otli-format-00

    introduces some improvements while not adding a lot of extra code to a reference encoder/decoder.

    Quote Originally Posted by SolidComp View Post
    I'm still very willing to work on the dictionary. There are large chunks of the head that we need covered, especially newer stuff like schema.org, all the pre-x fetches, CSP-related, etc. Ideally, if you could introduce an actual successor to brotli, it would be very, very helpful to enforce a standardized minified format for HTML, CSS, JS, and SVG. If people are already agreeing to run some new compressor on their server, like brotli, they should be willing to run a minifier that satisfies the standard (or the minifier would just be a preprocessing phase handled by the compressor). If you look into it, I'm sure you'll see that the best outcome on the wire comes from standardized minification combined with a good compressor, especially a compressor that is optimized for the minified input (call it a context).
    Quote Originally Posted by SolidComp View Post
    Shared Brotli format attempts to make the dictionary customizable in a practical way, also dictionary ordering can become context driven for some additional savings. In current scenarios the static dictionary gives a relatively small impact for Brotli -- for large file corpora it is a 0.001 % impact. For small file (<100 kB) corpora we can see that 1/3 of the win over gzip comes from the dictionary and 2/3 come from other improvements.
    Last edited by Jyrki Alakuijala; 9th June 2018 at 11:53.

  9. #9
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Canterbury corpus: mainstream gzip at quality 9 is 3.371x compression. If one would run it with gzip quality 1 (which no one I know uses because it is just too bad for any purpose), it would still be 2.913. Intel's implementation of gzip in this manner is gzip with a negative quality of -7 or so, and shouldn't be used as a reference in compression density. Just because it has Intel in the name doesn't mean that it is a good or mainstream compression density. I suspect that no one uses the intel's implementation of gzip and it is just an academic exercise.
    They apparently used the Calgary corpus, not the Canterbury, but the paper is a bit confusing on that issue. Canterbury is mentioned in the first para of section 5. Then see the Compression Ratio para in 5.1, where they mention Calgary as the actual corpus used for their reported results. How does this change your answer, if at all?



  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    In case anyone is interested, here are results from my benchmark:
    Corpus program series program name program options total compression ratio arithmetic ratio mean geometric ratio mean
    calgary Info-ZIP ZIP 3.0 -1 2,633 2,977 2,734
    calgary Info-ZIP ZIP 3.0 -2 2,731 3,086 2,832
    calgary Info-ZIP ZIP 3.0 -3 2,821 3,175 2,911
    calgary Info-ZIP ZIP 3.0 -4 2,932 3,347 3,051
    calgary Info-ZIP ZIP 3.0 -5 3,021 3,449 3,141
    calgary Info-ZIP ZIP 3.0 -6 3,059 3,497 3,178
    calgary Info-ZIP ZIP 3.0 -7 3,068 3,515 3,188
    calgary Info-ZIP ZIP 3.0 -8 3,079 3,548 3,202
    calgary Info-ZIP ZIP 3.0 -9 3,081 3,558 3,205
    canterbury Info-ZIP ZIP 3.0 -1 3,242 3,088 2,835
    canterbury Info-ZIP ZIP 3.0 -2 3,352 3,175 2,919
    canterbury Info-ZIP ZIP 3.0 -3 3,471 3,257 2,997
    canterbury Info-ZIP ZIP 3.0 -4 3,569 3,391 3,108
    canterbury Info-ZIP ZIP 3.0 -5 3,764 3,507 3,209
    canterbury Info-ZIP ZIP 3.0 -6 3,823 3,550 3,244
    canterbury Info-ZIP ZIP 3.0 -7 3,813 3,560 3,248
    canterbury Info-ZIP ZIP 3.0 -8 3,835 3,602 3,266
    canterbury Info-ZIP ZIP 3.0 -9 3,838 3,616 3,271
    I hope I've calculated the ratios properly.

  11. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    SolidComp (24th June 2018)

  12. #11
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    I don't understand the last two columns. What are they? Thanks in advance! Sorry for the ignorance

  13. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Total compression ratio = (original_size_1 + original_size_2 + ... original_sizeN) / (compressed_size_1 + compressed_size_2 + ... + compressed_size_N)
    Arithmetic ratio mean = (original_size_1 / compressed_size_1 + original_size_2 / compressed_size_2 + ... + original_sizeN / compressed_size_N) / N
    Geometric ratio mean = Math.pow((original_size_1 / compressed_size_1) * (original_size_2 / compressed_size_2) * ... * (original_size_N / compressed_size_N), 1.0 / N)

  14. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    Gonzalo (24th June 2018)

  15. #13
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Also, you said "Perhaps there are economical uses for such fpga, but it is not on mobile." – This seems vividly false. See this paper: https://blog.acolyer.org/2017/05/26/...-smart-phones/

  16. #14
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    And when you said "22.7 gbps is 2.8 GB/s. Speeds around 1-2 GB/s can be achieved by software. (possibly not by snappy)", this appears to concede my point. There's a big difference between 2.8 GB/s and 1-2 GB/s. Moreover, 1-2 GB/s is quite vague, and it's not clear what software achieves it, other than maybe some of the RAD codecs, LZTurbo, and something called Lizard (https://sites.google.com/site/powturbo/home/benchmark). You seem unwilling to credit codecs not created by you or Google, and always have to shoot down any achievement that doesn't come from you guys. Why can't you just give credit where credit is due?

  17. #15
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hardware based compressor can achieve more than 10GB throughput with decent compress ratio, with more advantage in power assumption comparing with software.

  18. #16
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by blue View Post
    Hardware based compressor can achieve more than 10GB throughput with decent compress ratio, with more advantage in power assumption comparing with software.
    But... often there is only one compressor/decompressor to be shared by say 18 cores. I very much would prefer the approach where hardware compression is added as new (somewhat) general purpose instructions that happen to be beneficial for reading n bits, helping with parallel ANS, doing faster match finding, hashing, etc.

  19. #17
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Additionally, hardware compressors have usually very limited memory, while e.g. main ratio boost of zstd/brotli comes from relatively huge window sizes.
    Indeed adding specialized instructions is a better way. It is also worth to have in mind that hardware tANS decoder could also decode Huffman.

  20. The Following User Says Thank You to Jarek For This Useful Post:

    Jyrki Alakuijala (18th February 2019)

  21. #18
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Memory size is not the problem. Modern FPGA could have tens MB SRAM insider, which is far more than CPU cache. ASIC chips could have even more. I have been involved in designing different compression IP for different products. In general the window size will be no more than 8K. Compression ratio may be just improved by 10% with 128K window, but with the same cost, throughput could be 4-8X faster by 8K window. You do the math......

    BTW, specialized instructions sound already on Intel's road map.
    https://www.techradar.com/news/beyon...-is-sunny-cove

  22. #19
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    8KB window? Zstd/brotli operates on 128MB .. 2GB window - we are talking about a few dozens of percents better compression ratios.
    High-end FPGA costs thousands dollars, above work on standard cheap processors, decoding in GB/s.

    I have looked at the techradar article, and it only says
    On the data-centric side, Intel claims Sunny Cove will feature larger key buffers and memory caches to optimize workloads. While most of this boost in cryptography performance won’t matter to most consumers, these improvements should lead to faster file compression and decompression across the board on all Sunny Cove CPUs
    Specialized instructions could reduce number of cycles per symbol, e.g. for tANS/Huffman decoder:
    t = decodingTable[x]; produceSymbol(t.symbol); x = t.newX + readBits(t.nbBits) 

    ASIC compressor makes sense e.g. for compressing data from IoT like remote sensors - it seems rarely used now (?) but could provide savings in both transmission and required buffers.
    Compression of this kind of data is often very simple - just take differences and use a static entropy coder with fixed probabilities.
    Using tANS entropy coder, such cheap layer could simultaneously include encryption ( https://arxiv.org/pdf/1612.04662 ).

  23. #20
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Jerek, 8KB is a tradeoff. Hardware Compressor market is small. Usually it is required by those who are thirsty for bandwidth, like SSD controller or some cloud giants. They have tons of data to be processing and each file could not be very big. Not too much will pay for 10% improvement with 4x cost (128K vs 8K).

    The top article only have a dictionary with 1K depth. http://www.eecg.toronto.edu/~mohamed/iwocl14.pdf

    And for IoT, it is even not practical. They are sensitive with chip area and power. Both maybe doubled with a compressor inside.

  24. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @blue:
    Did anyone try pure statistical compression in hardware (CM)?
    It might actually have less memory requirements than LZ encoding.
    Code:
    768,771 book1
    365,005 book1.gz1
    312,281 book1.gz9
    345,634 order1 CM

  25. #22
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    blue, you are talking about Lempel-Ziv based compressors, which from one side are relatively costly (memory), from the other are tough to motivate e.g. for SSD controller (?), especially that CPU can get ~30% better compression ratio in GB/s decoding using one of many cores.

    For IoT let say remote humidity sensor, instead of LZ, data compression can be just: take differences of values, and use fixed entropy coder: a single use of a KB-size table per symbol, providing like 50% reduction in data size: required buffer and transmission cost - providing savings in both hardware and energy ... additionally simultaneously encrypting the data if using tANS entropy coder.

  26. #23
    Member
    Join Date
    Dec 2015
    Location
    US
    Posts
    57
    Thanks
    2
    Thanked 112 Times in 36 Posts
    Canonical Huffman/other VLC decode tables you would use for HW are small and the biggest part of them (the canonical index->actual value map) is not on the critical path (code length determination). KBs are way off for Deflate-like alphabet sizes (~290 symbols) and length limits (15 bits), unless you're talking kilobits, and low single-digit ones at that.

    You _never_ set up an unrolled table. That's what you do in a SW decoder but it's _way_ off base for a fast (or area-efficient for that matter) HW impl, and needlessly makes the setup more expensive. The entire problem that's trying to solve in the first place doesn't exist when you're building your own HW.

    Moffat+Turpin "On the Inplementation of Minimum-Redundancy Prefix Codes" has the right algorithm to use. I doubt they came up with it either, but it's a good reference. It's summarized here: http://cbloomrants.blogspot.com/2010...man-paper.html

    The while loop in there - that is the thing you build. You do all the compares (up to max code length-1) in parallel. That's a 1-bit adder, a 2-bit adder, a 3-bit adder, ..., a (N-1)-bit adder, each attached to a correspondingly-sized register storing what Charles calls "huff_branchCodeLeftAligned". The carry-out bits gives you the ">=" test result, one per candidate code length. Feed that into a priority encoder (the type of circuit used to do bit scans aka leading/trailing zero counts) and you know the code length, which is all you need to be able to start decoding the next symbol. Most of this takes less time than the address decoding, wire delays and fan-out buffering for a multi-kB SRAM array would (the longer adders you want to build as fast adders though lest they hold up everything).

    The part in the "return" statement, looking up what Charles calls "baseCode", the alignment shifting, the extra subtraction, or the final table lookup to convert the canonical symbol index into the actual value - _none_ of that is on the critical path. Once you have the symbol ID and the original bits values, you can just pass these down the pipeline and takes as many cycles to do the rest as you want (i.e. you can build the rest optimized for power and area efficiency not minimum delay).

    Total storage to handle an arbitrary canonical VLC code with 15-bit length limit and Deflates ~290-symbol alphabet: 1+2+3+...+15 = 120 bits worth of registers for the equivalent of "huff_branchCodeLeftAligned", around 15*9 = 135 bits for "baseCode", and a 290x9b = 2610-bit (~330 bytes) single-ported SRAM storing the canonical index->value mapping.

    I have spent some time on this and not been able to come up with anything nearly as efficient for TANS decoding. It always needs a lot more memory elements and besides the table setup is a complete pain. Table setup times can't be neglected either.

  27. The Following 2 Users Say Thank You to fgiesen For This Useful Post:

    Jarek (19th February 2019),SolidComp (1st March 2019)

  28. #24
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    You are talking about large DEFLATE-like alphabets, but in many applications a small one is sufficient, for example for numerical data you can entropy code a few oldest bits (far from uniform) and assume uniform distribution for the remaining ones - write them directly.

    For L states, tANS tables require ~L lg L bits: below 0.5kB for 256 states, below 1kB for 512 states ... and cost of this table would be compensated by savings from compression - which would allow to reduce the hardware buffer and energy cost of transmission, additionally providing some encryption.

    Also, in many applications you can use fixed tables (no preparation), or allow to use transmitted coding tables e.g. once per month.

  29. #25
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @Shelwien Sorry I have no idea about CM...

  30. #26
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi Jarek, good point. While does it deserve a hardware IP. Maybe the MCU inside is power enough to do it.

  31. #27
    Member
    Join Date
    Dec 2017
    Location
    china
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Jarek:
    Intel even showed off a demo where a Sunny Cove CPU was able to perform a 7-Zip encode using AES-256 75% faster than an equivalent current Intel CPU. Of course, there was also the caveat of Intel using a recompiled version of 7-Zip designed to take advantage of Sunny Cove's instructions.



  32. #28
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by blue View Post
    Intel even showed off a demo where a Sunny Cove CPU was able to perform a 7-Zip encode using AES-256 75% faster than an equivalent current Intel CPU. Of course, there was also the caveat of Intel using a recompiled version of 7-Zip designed to take advantage of Sunny Cove's instructions.
    This doesn't necessarily have anything to do with compression or decompression as AES-256 is not a compression algorithm. AES-256 is a hardware friendly algorithm, and software implementations of it will be slow.

  33. #29
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by blue View Post
    Intel even showed off a demo where a Sunny Cove CPU was able to perform a 7-Zip encode using AES-256 75% faster than an equivalent current Intel CPU.
    Quote Originally Posted by Jyrki Alakuijala View Post
    This doesn't necessarily have anything to do with compression or decompression as AES-256 is not a compression algorithm. AES-256 is a hardware friendly algorithm, and software implementations of it will be slow.
    Indeed, the 75% is not about (de)compression, only encryption. On most news pages, the text was misleading, like "7-Zip demo 75% faster on Sunny Cove" without any further details, some didn't even mention the recompile. On PCWorld however, there was a screenshot that supposably shows the 7-Zip settings - Level "Store" and AES-256 encryption with password "intel123".

    Later in their article, they state:

    Other changes include specialized instructions to improve performance of vector processing, compression, and decompression.
    Which is quite vague again - and the improvement will most likely be much less than 75%.
    http://schnaader.info
    Damn kids. They're all alike.

  34. #30
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    80
    Thanks
    30
    Thanked 8 Times in 8 Posts

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 2
    Last Post: 28th December 2017, 12:35
  2. Fast implementation of gzip in FPGA
    By SolidComp in forum Data Compression
    Replies: 1
    Last Post: 8th May 2017, 03:48
  3. gzip - Intel IPP
    By M4ST3R in forum Download Area
    Replies: 5
    Last Post: 2nd June 2010, 15:09
  4. Gzip 1.2.4 hack (OpenWatcom compiles)
    By Rugxulo in forum Data Compression
    Replies: 9
    Last Post: 22nd May 2009, 00:17
  5. gzip-1.2.4-hack - a hacked version of gzip
    By encode in forum Forum Archive
    Replies: 63
    Last Post: 10th September 2007, 04:16

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •