Hi all,

Has anyone tested the new QuickAssist hardware compression accelerators from Intel? (They also accelerate crypto.)

From what I'm reading on their site, it's a low-power (40 watts max) PCI-3 board that can do 24 Gbps of compression on "DEFLATE (Lempel-Ziv 77); LZS (Lempel-Ziv-Stac)".

It seems hard to get details. Has anyone found deeper technical docs or whitepapers? If we can get gzip -6 level compression from their DEFLATE implementation at 24 Gbps, that would be sweet. The best we can do with conventional CPU software comp for the same compression ratio according to dnd's slightly dated benchmarks is libdeflate 6 at 123.66 MB/s. If those are decimal MB, then that's 0.989 Gbps, which makes Intel's QuickAssist 24 times faster (and possibly using less energy). (If dnd is actually using MiB instead of MB in his benchmark, then libdeflate is a touch faster in Gbps, but still 1/24 QuickAssist speed.

QuickAssist would presumably use less CPU and system memory since it's a separate board, but I'm not sure.

Sportman asked about QuickAssist in 2013, but got no response. Now it seems to be getting a much bigger push from Intel, though it doesn't seem all that easy to get one, which is annoying.

Something else: I think Intel's recent E3-series server processors might have substantial untapped potential. These would be the Skylake v5 and Kaby Lake v6 generations of the E3, respectively. The reason I wonder about them is that they're not only the cheapest and use less energy, but that many of these models have GPUs. Granted, they're Intel integrated GPUs, not monster NVIDIA stuff. But they're modern and powerful for what they are – low power integrated GPUs. Especially on the Skylake/v5, where they have many models with the Intel Iris Pro 580, which is like the Broadwell 6200 GPU – it has 128 MB of exclusive on-chip memory, like an L4 cache.

I think these integrated GPUs might be very helpful for compression because Intel has been so good lately in implementing support for the latest versions of OpenCL, at least 2.1 and possibly 2.2, which have massive improvements over the 1.2 and 2.0 versions many platforms stopped at. Anyone ever try to do some damage with modern Intel iGPUs using OpenCL or similar offloading? I think there's a good chance these cheaper E3s could save a lot of energy if someone fully exploited the GPUs. It probably wouldn't beat QuickAssist, but E3s could exploit a lot of untapped potential (and lots of E3 servers won't have QuickAssist). The Intel compiler tools can also help optimize OpenCL apps, and Vulkan, Metal, C++AMP.

Jyrki, if brotli is going to be the future for a while, it might be worth communicating with Intel to get brotli hardware-accelerated on QuickAssist. It would liberate thousands of servers and save lots of energy. If brotli is the future, we should try to get everyone to pivot to it and get on board.