Results 1 to 3 of 3

Thread: Shared Brotli Format

  1. #1
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts

    Shared Brotli Format

    We are preparing an extension of Brotli into full blown shared dictionaries. The shared dictionaries include LZ77ish dictionary model (similar to those that have been supported by Brotli repository — but not RFC7932 use — and ZStdandard for a long time), but also the Brotli's (distance,length) pair mapping dictionary mechanism and Brotli's word transforms to get variations of the dictionary -- for example for plurals, articles, inflection, prepositions, and capitalization in human spoken languages. In addition to what we have in Brotli, we include a context model that allows the dictionary to be reordered based on the two last bytes in the stream.

    There are a few other things in the scope of Shared Brotli than just shared dictionaries. It includes a large window extension of brotli, a patching mode that is faster and more dense than Bsdiff+Brotli, and a framing format.

    Shared Brotli is described in https://datatracker.ietf.org/doc/dra...brotli-format/

    In HTTP content encoding simulations of Shared Brotli we have often seen a 40–60 % reduction of data transfer in comparison to the traditional RFC7932 Brotli — after the shared dictionary has been transferred.

  2. The Following 2 Users Say Thank You to Jyrki Alakuijala For This Useful Post:

    khavish (20th March 2018),pothos2 (20th March 2018)

  3. #2
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Does the Brotli dictionary word transforms feature support wrapping words in spaces? What about punctuation like periods? We often see people construct dictionaries with words like: the, they, and, or, etc.

    Those terms are wrong. the should actually be a five-character string [space]the[space]. That is, if it's the lowercase the. The uppercase The should actually be in the dictionary as [period][space]The[space]. That's a six character string that works for all sentences except the start of a paragraph. they, and, or and lots of other lowercase non-sentence-terminating words should be wrapped in spaces in the dictionary. Capitalized forms of Therefore, Moreover, Thus, etc. should include the closing period of the sentence that just ended, the following space, then the word, then another space, since that is how they will be situated in common and correct English. (Obviously there will be typos where people don't have a space after a sentence or something, but so what? It won't be the majority case.)

  4. #3
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by SolidComp View Post
    Those terms are wrong.
    Shared Brotli will allow you to do this in the shared dictionary. You can define transforms with your own postfixes and prefixes. Normal Brotli is what it is. You can see its transforms in rfc7932, near the end.

Similar Threads

  1. Brotli
    By willvarfar in forum Data Compression
    Replies: 212
    Last Post: 30th September 2018, 17:55
  2. Large-window brotli (incompatible with standard brotli)
    By Jyrki Alakuijala in forum Data Compression
    Replies: 0
    Last Post: 4th October 2016, 01:45
  3. Brotli literal and offset encoding
    By geza in forum Data Compression
    Replies: 10
    Last Post: 21st June 2016, 15:17
  4. improving brotli
    By inikep in forum Data Compression
    Replies: 6
    Last Post: 18th November 2015, 22:45
  5. New archive format
    By Matt Mahoney in forum Forum Archive
    Replies: 9
    Last Post: 25th December 2007, 12:22

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •