Page 1 of 3 123 LastLast
Results 1 to 30 of 67

Thread: Transfer-Encoding & Data Compression

  1. #1
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Transfer-Encoding & Data Compression

    Some compression related pointers about Content-Encoding

    - HTTP compression

    - Results of experimenting with Brotli for dynamic web content
    Benchmarking brotli, deflate, gzip for on-the-fly compression.
    compression quality/ Speed / Connection speedup

    - Brotli Accept-Encoding/Content-Encoding
    Discussion about brotli content-encoding in firefox.
    Scroll to bottom of the page for recent posts.

    - Letting People in the Door. How and why to get 2s page loads
    Example compressing 4.2 MB to 388K
    unsing GZIP compression + Minification + Code Splitting + Responsive Images

    - How to optimize and speed-up your website. A Complete guide

    - Internet: 0.99 MB/s (xDSL) and 5.85 MB/s (Cable/Fiber)

    - 4G and 3G mobile broadband speeds research
    - 4G connection speeds up in 40 countries
    - state of the internet report
    - List of countries by Internet connection speeds

    - How to simulate a low bandwidth connection for testing web sites and applications

    - Incorporate conditional loading into your design for a better UX

    - Latency: The New Web Performance Bottleneck
    ...In fact, upgrading from 5Mbps to 10Mbps results in a mere 5% improvement in page loading times!...
    ...For every 20ms improvement in latency, we have a linear improvement in page loading times.
    - Web Performance Trends: Speeding Up the Web
    ...global Internet traffic in 2019 will be equivalent to 66 times the volume of the entire global Internet in 2005...Globally, Internet traffic will reach 37 gigabytes (GB) per capita by 2019, up from 15.5 GB per capita in 2014
    ...After looking at the stats above, time should definitely be spent on optimizing images and fonts, as well as video...
    Last edited by dnd; 5th November 2015 at 23:53.

  2. #2
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Lightbulb Brotli Content-Encoding

    The most important in http-transfer encoding is the time it takes to load a web page.
    From the brotli benchmark paper we have an average page size of 54951 bytes (70.611.753 bytes / 1.285 pages)
    that will be compressed to:

    brotli : 7921 bytes (ratio 6.938 ) - new 'br' content-encoding
    zopfli : 9524 bytes (ratio 5.770 ) - existing 'gzip' content-encoding
    Difference: 9524 - 7921 = 1603 bytes

    Decompression speed is not very important for this use case and page size.
    If we take for example a slow connection of 100KB/s, the saving of using brotli instead of zopfli (zlib decompression)
    is just 0.016 (1603/102400) seconds!
    It is clear that such a difference is negligible and has no negative impact on users, even when
    the connection speed is X-times more slower or the page is larger than average.
    Additionally, other components like images, page rendering, connection latency, web page optimizations like minification,... must be also considered when benchmarking 'web browsing speed'.
    Last edited by dnd; 30th October 2015 at 00:44.

  3. The Following User Says Thank You to dnd For This Useful Post:

    Jyrki Alakuijala (30th October 2015)

  4. #3

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by dnd View Post
    The most important in http-transfer encoding is the time it takes to load a web page.
    From the brotli benchmark paper we have an average page size of 54951 bytes (70.611.753 bytes / 1.285 pages)
    One web page consists of multiples of documents compressed separately. See the chart 'Total transfer size per page' in http://httparchive.org/interesting.php

    More approachable text can be found at: http://www.webperformancetoday.com/2...wth-2010-2013/

    Rough guesswork based on these numbers: Nowadays, the total transfer size could be about 1.5 MB/page. Roughly 550 kB of it is HTML, Scripts, CSS, etc. If we can transfer about 110 kB less using brotli, it may be possible to win about 1.1 seconds of download time on a 100 kB/s connection -- or 7.3 % on any connection where the bandwidth is the latency-limiting condition.

  6. #5
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    ..., If we can transfer about 110 kB less using brotli, it may be possible to win about 1.1 seconds of download time on a 100 kB/s connection
    I'm not contesting the benefits of using compression,
    but replacing an already existing content encoding "gzip/deflate" (zlib/zopfli) with "br" (brotli).
    Regarding, the average web page size, one must differentiate between "cold" (first download) and "hot" loading (cached browsing),
    additionally essential assets (CSS, Javascript,...) are shared between websites. So the effective size of downloaded text in a web page is
    much smaller.


    css & javascript compression tested with TurboBench.

    Page size cold loading : 56kB + 73kB + 353 kB = 482kb

    Difference between compression with brotli and zopfli
    HTML 56kB: 9705 (17,33%) - 8071 (14,41%) = 1634
    CSS 73kB: 19710 (27.9%) - 16352 (22.4%) = 3358
    JS 353kB: 98487 (27.9%) - 86485 (24.5%) = 12002

    Sum: 1634 + 3358 + 12002 = 16994 bytes


    Replacing "gzip/deflate" (zopfli) with "br" brotli will save just: 0.16 seconds and not 1.1 seconds
    when a web page is loaded for the first time and only <0.02 sec. for subsequent downloading.


    This saving is negligible even without looking at the relative total download time with images (~2 to 15 seconds on mobile phones).
    Last edited by dnd; 2nd November 2015 at 13:10.

  7. #6
    Member
    Join Date
    Mar 2015
    Location
    Germany
    Posts
    57
    Thanks
    34
    Thanked 38 Times in 15 Posts
    Just my two cents on this: I think the compression is not meant for the user but mostly for the hoster. They pay for their bandwidth and if they can save around 5% that's 5% more users with the same hardware. It's bare money.

  8. #7
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by Christoph Diegelmann View Post
    Just my two cents on this: I think the compression is not meant for the user but mostly for the hoster...
    Please look at the blog "Introducing Brotli"
    Last edited by dnd; 2nd November 2015 at 13:37.

  9. #8
    Member
    Join Date
    Mar 2015
    Location
    Germany
    Posts
    57
    Thanks
    34
    Thanked 38 Times in 15 Posts
    I've read that article but what you advertise and what you sell is not always the same. Even if they developed it to improve the user experience it still saves them a lot of money.

  10. #9
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    I'm making up all the numbers below to just to approximate the scale of gzip->brotli financial impact. Please feel free to make up your own numbers.

    Let's assume 500 billion interactions in the internet per day, and 50 ms saving on average by brotli. Let's assume a computer costs $1 per day to run. In one day we save $289351 with brotli in CPU costs. Let's assume the bandwidth costs (yes, they seem to be paid by someone else than the end user, but actually it is the end user, one way or the other) are 10x more than the cpu. We get $2893518 from bandwidth savings. Let's assume the average income of people using the internet is $60 per day. We get $17361108 saving per day from people being able work more.

    Annually the savings for 'humanity' would be: $105 million, $1.05 billion and $6.3 billion, a total of ~$7.4 billion. The biggest impact seems the user's time. (I hope I didn't make a ridiculous error here, the numbers look kind of big...)

    edit: I wildly extrapolated the 500 billion from claims of 3.5 billion google searches and 4.5 billion facebook likes and 200 billion emails per day.
    Last edited by Jyrki Alakuijala; 2nd November 2015 at 22:46.

  11. #10
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    I wonder how many more billions of dollars could the 'humanity' save if e.g. Google would instead spent a few hundred thousand for a competition to choose the best compressor for this purpose?

    If as you write, data compression is so important and valuable, why don't give some boost to this field - currently of a few enthusiasts giving away their life work for free?

  12. #11
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by Christoph Diegelmann View Post
    ...but what you advertise and what you sell is not always the same...
    I'm sorry, I don't want to be involved in such unrelated discussions, but the quote:
    "At Google, we think that internet users’ time is valuable, and that they shouldn’t have to wait long for a web page to load"
    has little to do with advertising, it is simply misleading, because this saving is for most users practically imperceptible.


    Quote Originally Posted by Jyrki Alakuijala View Post
    ...Annually the savings for 'humanity' would be ..., a total of ~$7.4 billion
    This is a naive assumption, simply because it is based on an average connection speed of 100kb/s for web hosters, providers and users.
    But most important, the (growing) global internet traffic and obviously costs is due to videos, images and advertising, but not text.

  13. #12
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    (I hope I didn't make a ridiculous error here, the numbers look kind of big...)
    I'm afraid that you did.
    https://en.wikipedia.org/wiki/List_o...%29_per_capita
    World average GDP per capita is just over 10K USD yearly.
    If you divide it by 365.25 and then by 16, that's just over 1.76 USD/hour (transfers at work are as important as any other ones).
    45% of world population has internet connection. Richer ones tend to have internet connection more often than the poor ones.
    Even if we simplify and say that the internet population makes 100% of world GDP you're off by a factor of 15 and I think that really it's more like 20.

  14. #13
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by m^2 View Post
    Even if we simplify and say that the internet population makes 100% of world GDP you're off by a factor of 15 and I think that really it's more like 20.
    If you take that into account that I calculated a daily gdp impact and you calculated an hourly gdp impact, it is a 24x difference -- our numbers match

  15. #14
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Correct!

  16. #15
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    New:

    - Latency: The New Web Performance Bottleneck
    ...In fact, upgrading from 5Mbps to 10Mbps results in a mere 5% improvement in page loading times!...
    ...For every 20ms improvement in latency, we have a linear improvement in page loading times.
    - Web Performance Trends: Speeding Up the Web
    ...global Internet traffic in 2019 will be equivalent to 66 times the volume of the entire global Internet in 2005...Globally, Internet traffic will reach 37 gigabytes (GB) per capita by 2019, up from 15.5 GB per capita in 2014

  17. #16
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    - Better than Gzip Compression with Brotli
    Shows how to setup a near real simulation of "brotli content encoding" in firefox 44.

  18. #17
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Chrome browser: Intent to Ship - Brotli (Accept-encoding: br on HTTPS connection)

  19. #18
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    ...and on e.g. crapy wi-fi link one can get virtually no improvements due to packet loss and how TCP handles it. Actually, most TCP implementations drop speed to nowhere on slightest packet loss or delays, which are usual for wireless links, especially if you do not seat on AP/Cell all the time.

    IIRC google had some ideas about FEC coding over UDP to mitigate it, but it looks like this idea stalled.

  20. #19
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    There is now a video "zopfli v. brotli - Comparison between compression algorithms in chrome" showing that there is no visible difference for smartphone web browsing between the two algorithms.

  21. #20
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by dnd View Post
    There is now a video...
    Client support alone is not enough. "br" content encoding needs client support, server support and a https connection.

  22. #21
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Source discussion

    Quote Originally Posted by dnd View Post
    As demonstrated in the thread "Transfer-Encoding & Data Compression" the bandwidth saving of using other content-encoding than gzip for text is negligible.
    And most important, the (growing) global internet traffic is due to videos, images, advertising, Ad-tracking,... but not text.
    Quote Originally Posted by Jyrki Alakuijala View Post
    I think there was no general agreement about that, and I'd like to think it is not true. I have heard about a 10 % median latency reduction measured from a brotli deployment.

    http://httparchive.org/trends.php gives us more details on the composition of modern webpages.
    In my calculation I'm considering the user viewpoint only, with a slow 100KB/s connection.
    On a server side, only real szenario experiments can deliver exact results, because of several factors.
    One must also make a difference between static and dynamic content as we have different compression ratio and memory usage.

  23. #22
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Update: Static/Dynamic web content compression benchmark with TurboBench

    - Latest version for all compressor
    - New: zstd included for comparison only (content-encoding not supported)

  24. The Following 2 Users Say Thank You to dnd For This Useful Post:

    Jyrki Alakuijala (21st April 2017),SolidComp (27th April 2017)

  25. #23
    Member
    Join Date
    Jan 2017
    Location
    Germany
    Posts
    48
    Thanks
    25
    Thanked 10 Times in 7 Posts
    Recently I wondered about why the efficient base91 encoding for email attachments is poorly supported by mail-programs.
    Spending a lot of work in saving a lot of bytes by using data compression but using an less efficient transfer-coding like base64.

  26. #24
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I think proper benchmark for transfer encoding should entail much more than just relating size savings to average transmission speed.

    The reality is that:
    - an user views on average much more than one page of a single website during a browsing session
    - static data is cached by browser, sometimes for a long time, eg many days or weeks if the static CSS and/ or JS files don't change
    - dynamic data has to be compressed every time a browser makes a request to server

    I will focus here on compressing dynamically generated data, because for static data the reality is simple - we can compress using highly assymetrical algorithms, spend a long time compressing and that's still a win because that does not delay downloading and does not increase decompression time considerably.

    Compression on request can make generating responses slower by putting an additional stress on server CPU. CPU time consumed for strong compression could be used for faster generation of uncompressed data followed by fast compression. Therefore enabling strong compression can simultaneously improve experience on bandwidth-starved connections (by sending less data) and worsen experience on high-bandwidth connections (by waiting on other compression processes).

    Compression on request can be done in (at least) two ways:
    1. compress whole data at once, buffer it somewhere, then stream the compressed data
    2. compress in streaming way, with only a small, fixed size in-memory buffer (larger than a window for doing LZ parsing)

    Approach #1 advantages:
    - we can reduce the number of simultaneous compression processes thus reducing the memory requirements on server
    Approach #1 drawbacks:
    - increasing latency - we can't stream the compressed data until all of the uncompressed data has been generated

    Overall Approach #1 is not viable.

    Approach #2 advantages:
    - the delay between the start of generating uncompressed data and sending compressed data is small and generally constant
    Approach #2 drawbacks:
    - many requests at once means many compression processes at once - that consumes memory on the server quickly because for LZ compression schemes usually compression requires much more memory than decompression



    In my opinion a PPM based coder should be considered as next-generation compressor for dynamic textual content. PPM has several advantages for such purposes:
    - it can scale down well with memory usage - on http://mattmahoney.net/dc/text.html we see that ppmsj compressed enwik9 to 233 MB using only 1.8 MB of RAM for compression
    - it works especially well with small input data, compared to LZ, because LZ's strength always increase as the sliding window fills, whereas in PPM case long input data causes model pruning or restarting which decreases compression
    - it is easy to pre-train
    - it has minimal latency as each symbol is processed separately, unlike typical strong LZ algorithms that divide input into chunks for parsing (selecting matches)
    - it has better compression ratio than LZ even if we constrain the memory limit for decompression to the same level for both PPM and LZ

  27. #25
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    ... PPM ...
    One disadvantage of PPM based solutions is that there can be a significant speed degradation for cached use or on the fastest networks. There are possible mitigations, but they come with some additional complexity.

  28. #26
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    cached use
    I've explicitly stated that I'm focusing here on dynamic content - content that is generated anew each time.
    on the fastest networks
    On fastest networks switch to deflate. There are hardware deflate encoders, both fast and energy efficient: https://en.wikipedia.org/wiki/DEFLATE#Hardware_encoders

  29. #27
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by dnd View Post
    Update: Static/Dynamic web content compression benchmark with TurboBench

    - Latest version for all compressor
    - New: zstd included for comparison only (content-encoding not supported)
    What compression level (number) did you use with Zstd?

  30. #28
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    In my opinion a PPM based coder should be considered as next-generation compressor for dynamic textual content. PPM has several advantages for such purposes:
    - it can scale down well with memory usage - on http://mattmahoney.net/dc/text.html we see that ppmsj compressed enwik9 to 233 MB using only 1.8 MB of RAM for compression
    - it works especially well with small input data, compared to LZ, because LZ's strength always increase as the sliding window fills, whereas in PPM case long input data causes model pruning or restarting which decreases compression
    - it is easy to pre-train
    - it has minimal latency as each symbol is processed separately, unlike typical strong LZ algorithms that divide input into chunks for parsing (selecting matches)
    - it has better compression ratio than LZ even if we constrain the memory limit for decompression to the same level for both PPM and LZ
    But both brotli and Zstd outperform ppms in the benchmark. ppms is much slower to decode, more than an order of magnitude slower than brotli and Zstd. For a next-gen web compression format, ideally we want to see single-digit decode performance in terms of ns/byte, or at least in the 17 ns/byte ballpark, which is what gzip gets (caveat: Mahoney's benchmark uses an ancient, 15-year-old "gzip for Windows" executable, and very old versions of brotli, Zstd, etc.)

    Your arguments sound reasonable, but what matters in the end is the actual performance data. So "it is easy to pre-train" sounds great, but if it's slower than brotli and Zstd, it doesn't matter how easy it is to pre-train. And your argument that it works well for small input data (proposition 1) seems to rest on the fact that it works poorly for large input data (proposition 2), but proposition 1 doesn't actually follow from proposition 2.

    I think the best compressor for small input data is going to be the one that has a smartly defined dictionary that is specific to HTML, CSS, JS, and SVG. brotli has a messy and non-peer-reviewed dictionary, so it underperforms for that and other reasons. I suspect Zstd could be trained to beat brotli by every measure for web compression, but we really ought to start over with a web-specific codec that eases binary data generation from the very outset, like from content management systems, and which reverses the ridiculous anti-patterns of web development, like separate CSS and JS file downloads, 90+ percent of whose contents are entirely unused by the page that forces their download (even though the exact CSS and JS needed by the webpage is extremely easy to determine and could be included in the head – websites are still built as though this knowledge is not available.)

  31. #29
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    For dynamic content the bottleneck is compression speed. It doesn't matter if client can decode at 20 MBps if server can only compress decently at 1 MBps. Also requiring half a gigabyte of server RAM to server a file is ridiculous.

    Proper test for dynamic content compression would entail:
    - running many simultaneous compression processes at once, because server CPUs stopped being single-threaded decades ago,
    - scaling down the encoder memory requirements to about 1 megabyte (maybe few megabytes) per single compression process - Gzip/ Deflate doesn't need more than that,
    - compressing textual data, as that's the ubiqutous format used in WWW,
    - compressing files that are on average (much?) smaller than a megabyte,

    separate CSS and JS file downloads, 90+ percent of whose contents are entirely unused by the page
    Static files can be cached. If only 10% of the files' content is used, but if those files are downloaded 10x then embedding these files in dynamic content won't pay off.

    And your argument that it works well for small input data (proposition 1) seems to rest on the fact that it works poorly for large input data (proposition 2), but proposition 1 doesn't actually follow from proposition 2.
    OK. Rephrase:
    - PPM works better than LZ on textual data when given infinite amount of RAM,
    - when we limit RAM size for decoding or encoding things can change, depending on the implementation details of PPM and LZ codecs,
    - if we compare PPMSj with 2 MB RAM usage to zstd that has half a gigabyte RAM usage during compression then that isn't a fair comparison,

    Remember that I'm talking about challenges of dynamic content compression. And I've already written that for static content the situation is vastly different as static content can be compressed off-line, without impacting user-perceived latency at all and we can control resource usage effectively (by running optimal number of threads at once and because we know the exact size of content upfront then we can tune the compression parameters).

  32. #30
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    Quote Originally Posted by SolidComp View Post
    What compression level (number) did you use with Zstd?
    zstd,22

Page 1 of 3 123 LastLast

Similar Threads

  1. loseless data compression method for all digital data type
    By rarkyan in forum Data Compression
    Replies: 157
    Last Post: 9th July 2019, 17:28
  2. Replies: 38
    Last Post: 27th April 2016, 18:01
  3. Codebok encoding
    By kredens in forum Data Compression
    Replies: 1
    Last Post: 29th October 2015, 08:57
  4. comparing file transfer speed with ccm and lzma compression
    By Shelwien in forum Data Compression
    Replies: 19
    Last Post: 10th March 2011, 12:49
  5. Advanced Huffman Encoding
    By Simon Berger in forum Data Compression
    Replies: 28
    Last Post: 15th April 2009, 14:24

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •