Results 1 to 12 of 12

Thread: Google released Snappy compression/decompression library

  1. #1
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts

    Google released Snappy compression/decompression library


  2. #2
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    In-memory test (compression and decompression) with ENWIK8 using 1 core of Athlon X4 2.8 GHz (32-bit compilation under gcc 4.5.2 (MinGW) -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math --param inline-unit-growth=999 -DNDEBUG):

    Code:
    lzjb 2010        = 842 ms (115 MB/s), 100000000 -> 68711273, 410 ms (238 MB/s)
    lzo 2.04 1x      = 964 ms (101 MB/s), 100000000 -> 53481944, 374 ms (261 MB/s)
    fastlz 0.1 -1    = 614 ms (159 MB/s), 100000000 -> 55239233, 390 ms (250 MB/s)
    fastlz 0.1 -2    = 723 ms (135 MB/s), 100000000 -> 54163013, 362 ms (269 MB/s)
    lzf 3.6 vf       = 738 ms (132 MB/s), 100000000 -> 53945381, 267 ms (365 MB/s)
    lzf 3.6 uf       = 667 ms (146 MB/s), 100000000 -> 57695415, 276 ms (353 MB/s)
    lzrw1            = 891 ms (109 MB/s), 100000000 -> 59669043, 438 ms (222 MB/s)
    lzrw1-a          = 830 ms (117 MB/s), 100000000 -> 59448006, 372 ms (262 MB/s)
    lzrw2            = 749 ms (130 MB/s), 100000000 -> 55312164, 464 ms (210 MB/s)
    lzrw3            = 722 ms (135 MB/s), 100000000 -> 52468327, 488 ms (200 MB/s)
    lzrw3-a          = 1574 ms (62 MB/s), 100000000 -> 47874203, 484 ms (201 MB/s)
    snappy 1.0       = 472 ms (206 MB/s), 100000000 -> 58350605, 199 ms (490 MB/s)
    quicklz 1.5.0 -1 = 429 ms (227 MB/s), 100000000 -> 52334371, 381 ms (256 MB/s)
    quicklz 1.5.0 -2 = 1140 ms (85 MB/s), 100000000 -> 45883075, 442 ms (220 MB/s)
    quicklz 1.5.0 -3 = 4329 ms (22 MB/s), 100000000 -> 44789793, 216 ms (452 MB/s)
    all              = 22974 ms
    The test with gcc 3.4.5 (the same options):
    Code:
    lzjb 2010        = 912 ms (107 MB/s), 100000000 -> 68711273, 454 ms (215 MB/s)
    lzo 2.04 1x      = 870 ms (112 MB/s), 100000000 -> 53481944, 324 ms (301 MB/s)
    fastlz 0.1 -1    = 640 ms (152 MB/s), 100000000 -> 55239233, 313 ms (312 MB/s)
    fastlz 0.1 -2    = 665 ms (146 MB/s), 100000000 -> 54163013, 342 ms (285 MB/s)
    lzf 3.6 vf       = 775 ms (126 MB/s), 100000000 -> 53945381, 270 ms (361 MB/s)
    lzf 3.6 uf       = 675 ms (144 MB/s), 100000000 -> 57695415, 278 ms (351 MB/s)
    lzrw1            = 697 ms (140 MB/s), 100000000 -> 59669043, 387 ms (252 MB/s)
    lzrw1-a          = 683 ms (142 MB/s), 100000000 -> 59448006, 337 ms (289 MB/s)
    lzrw2            = 631 ms (154 MB/s), 100000000 -> 55312164, 368 ms (265 MB/s)
    lzrw3            = 632 ms (154 MB/s), 100000000 -> 52468327, 504 ms (193 MB/s)
    lzrw3-a          = 1582 ms (61 MB/s), 100000000 -> 47874203, 475 ms (205 MB/s)
    snappy 1.0       = 538 ms (181 MB/s), 100000000 -> 58350605, 232 ms (420 MB/s)
    quicklz 1.5.0 -1 = 535 ms (182 MB/s), 100000000 -> 52334371, 421 ms (231 MB/s)
    quicklz 1.5.0 -2 = 1337 ms (73 MB/s), 100000000 -> 45883075, 478 ms (204 MB/s)
    quicklz 1.5.0 -3 = 5645 ms (17 MB/s), 100000000 -> 44789793, 211 ms (462 MB/s)
    all              = 23976 ms
    The test with Intel C++ Compiler XE 12.0.2.154 [IA-32]
    Code:
    lzjb 2010        = 839 ms (116 MB/s), 100000000 -> 68711273, 400 ms (244 MB/s)
    lzo 2.04 1x      = 832 ms (117 MB/s), 100000000 -> 53481944, 337 ms (289 MB/s)
    fastlz 0.1 -1    = 604 ms (161 MB/s), 100000000 -> 55239233, 330 ms (295 MB/s)
    fastlz 0.1 -2    = 735 ms (132 MB/s), 100000000 -> 54163013, 335 ms (291 MB/s)
    lzf 3.6 vf       = 789 ms (123 MB/s), 100000000 -> 53945381, 273 ms (357 MB/s)
    lzf 3.6 uf       = 708 ms (137 MB/s), 100000000 -> 57695415, 282 ms (346 MB/s)
    lzrw1            = 682 ms (143 MB/s), 100000000 -> 59669043, 366 ms (266 MB/s)
    lzrw1-a          = 668 ms (146 MB/s), 100000000 -> 59448006, 319 ms (306 MB/s)
    lzrw2            = 641 ms (152 MB/s), 100000000 -> 55312164, 358 ms (272 MB/s)
    lzrw3            = 605 ms (161 MB/s), 100000000 -> 52468327, 425 ms (229 MB/s)
    lzrw3-a          = 1640 ms (59 MB/s), 100000000 -> 47874203, 417 ms (234 MB/s)
    snappy 1.0       = 1139 ms (85 MB/s), 100000000 -> 58350605, 501 ms (194 MB/s)
    quicklz 1.5.0 -1 = 428 ms (228 MB/s), 100000000 -> 52334371, 531 ms (183 MB/s)
    quicklz 1.5.0 -2 = 931 ms (104 MB/s), 100000000 -> 45883075, 526 ms (185 MB/s)
    quicklz 1.5.0 -3 = 4139 ms (23 MB/s), 100000000 -> 44789793, 284 ms (343 MB/s)
    all              = 22180 ms
    Last edited by inikep; 25th March 2011 at 17:06.

  3. #3
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @sportsman: there are already infos about snappy under the thread

    http://encode.ru/threads/1253-LZO-Pr...N-are-released

    best regards
    Last edited by joerg; 25th March 2011 at 17:48. Reason: correct

  4. #4
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    In-memory test (compression and decompression) with ENWIK8 using 1 core of Intel Xeon X5355 @ 2.66GHz (64-bit compilation under gcc 4.1.1 (Linux) -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math --param inline-unit-growth=999 -DNDEBUG):

    Code:
    lzjb 2010        = 813 ms (120 MB/s), 100000000 -> 68711271, 382 ms (255 MB/s)
    lzo 2.04 1x      = 985 ms (99 MB/s), 100000000 -> 53481944, 353 ms (276 MB/s)
    fastlz 0.1 -1    = 710 ms (137 MB/s), 100000000 -> 55239233, 350 ms (279 MB/s)
    fastlz 0.1 -2    = 774 ms (126 MB/s), 100000000 -> 54163013, 365 ms (267 MB/s)
    lzf 3.6 vf       = 685 ms (142 MB/s), 100000000 -> 53945381, 301 ms (324 MB/s)
    lzf 3.6 uf       = 693 ms (140 MB/s), 100000000 -> 57695415, 315 ms (310 MB/s)
    lzrw1            = 781 ms (125 MB/s), 100000000 -> 59669043, 467 ms (209 MB/s)
    lzrw1-a          = 713 ms (136 MB/s), 100000000 -> 59448006, 368 ms (265 MB/s)
    lzrw2            = 689 ms (141 MB/s), 100000000 -> 55312164, 396 ms (246 MB/s)
    lzrw3            = 734 ms (133 MB/s), 100000000 -> 52468327, 461 ms (211 MB/s)
    lzrw3-a          = 1637 ms (59 MB/s), 100000000 -> 47874203, 484 ms (201 MB/s)
    snappy 1.0       = 488 ms (200 MB/s), 100000000 -> 58350605, 213 ms (458 MB/s)
    quicklz 1.5.0 -1 = 520 ms (187 MB/s), 100000000 -> 52334371, 514 ms (189 MB/s)
    quicklz 1.5.0 -2 = 1145 ms (85 MB/s), 100000000 -> 45883075, 573 ms (170 MB/s)
    all              = 22002 ms

  5. #5
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    any change if using -march=native switch too?

  6. #6
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    I've used the following options (-march=native was slower):
    -march=nocona for Intel Xeon X5355 @ 2.66GHz
    -march=k8 for Athlon X4 2.8 GHz

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I have been looking at the snappy source code. Maybe there is a document that explains the data format and I didn't see it, but the code is pretty well documented.

    Snappy uses byte aligned LZ77. A tag byte is followed by a literal string or a copy. The 2 low bits have the following meaning:

    00 = literal
    01 = copy, 1 byte offset
    10 = copy, 2 byte offset
    11 = copy, 4 byte offset (not used)

    For a literal, a length of 1 to 60 is encoded in the high 6 bits of the tag as length - 1, and the literal bytes follows. Lengths up to 2^32 are encoded by putting 60..63 in the high 6 bits to indicate that a 1 to 4 byte length field follows. All numbers are encoded in little-endian (LSB first) format.

    All copies have a length of at least 4 bytes. If the offset is less than 2048 and length is less than 12 then a 1 byte offset code (01) is used. The length - 4 is encoded in the middle bits 4..2 and the high 3 bits of the offset are encoded in bits 7..5. The low 8 bits of the offset are encoded in the next byte. Copies are allowed to overlap. For example, "AAAAAA" could be encoded as a literal "A" and a copy with an offset of 1 and length 5.

    A 2 byte offset encodes the length - 1 in the high 6 bits and the offset in the next 2 bytes. This means that the maximum length is 64. Longer matches are encoded by splitting into matches of length 64, plus a final match, with the match just before possibly of length 60 to make sure the last match has length at least 4.

    A 4 byte offset encodes the same way except for 2 more bytes in the offset. The decompresser will handle it but it is not used by the compressor because the input is split into blocks of size 32K that are compressed independently. Thus, an offset never spans a block. The entire sequence of blocks is preceded by the total length, at most 2^32 - 1, encoded in base 128 in up to 5 bytes with the high bit set to 0 to indicate the last (most significant) byte.

    The compressor looks for matches using a hash table which is initialized to 0 at the start of each block. The hash table size is the smallest power of 2 in the range 256...16384 that is at least as large as the input size. The first 4 bytes of a potential match are hashed by interpreting it as a 32 bit little-endian number, multiplying by 0x1e35a7bd, and shifting out the low bits to match the hash table size. If there are 15 or fewer bytes remaining after encoding a match then the rest is encoded as a literal with no further search. In the worst case, compression is guaranteed not to make the input larger by more than by a factor of 1/6 + 32 bytes.

    In order to speed up compression of hard to compress data, after no match is found in the first 32 tests, then only every other byte is tested for the next 32 tests, then every 3rd byte for the next 32 tests, and so on. The comments state this costs 5% in time and 0.1% in space, but greatly speeds up (lack of) compression of JPEG, PNG, etc.

    The library has no code for file compression. Rather it compresses from arrays or strings to strings. This seemed inefficient but it's apparently not. It calls string::resize() only once, then uses the trick of writing to the string's internal memory through the pointer (s.empty() ? NULL : &*s.begin()), which a comment states is supposed to be defined to work in C++0X but in practice works on all the systems they tested.

    Copies of 16 bytes or less are implemented by two 64-bit assignments with the assumption that the source and destination are sufficiently padded. Longer copies use memcpy(). This requires some extra code to handle the initial part of an overlapped copy.

  8. #8
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    "it compresses from arrays" means likely mmap'ed.
    " to strings" would save them having to link against a new operator or malloc because it leaves it up to the runtime.

    That's just random justification, it could well be really wrong guessing.

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    LTCB results. http://mattmahoney.net/dc/text.html#5277

    The test loads enwik9 into a string, compresses to a new string, and writes it. Compressing in 32K blocks would be faster and give almost the same results.

    Edit: added the same description to my book: http://mattmahoney.net/dc/dce.html#Section_526
    Last edited by Matt Mahoney; 28th April 2011 at 01:30.

  10. #10
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I have been looking at the snappy source code. Maybe there is a document that explains the data format and I didn't see it, but the code is pretty well documented.
    FWIW, others have taken a stab at a file format specification, too; see issue #32 in the Snappy bug tracker. It might be that we at some point integrate this into the Snappy documentation somehow.

    /* Steinar */

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Thanks. That's also a good explanation of the format.

  12. #12
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Thanks. That's also a good explanation of the format.
    Now also in the source code distribution: http://code.google.com/p/snappy/sour...escription.txt

    It's not a formal spec, but save for the source code I believe it's the most official description of the format that exists anywhere.

    /* Steinar */

Similar Threads

  1. decompression is ~1000 times slower than compression?
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 17
    Last Post: 28th April 2010, 13:27
  2. Interested in Google-Wave?
    By Vacon in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 29th November 2009, 20:11
  3. QuickLZ ZIP - new zip/deflate library
    By Lasse Reinhold in forum Forum Archive
    Replies: 23
    Last Post: 1st October 2007, 22:08
  4. MM compression library
    By Bulat Ziganshin in forum Forum Archive
    Replies: 29
    Last Post: 12th September 2007, 15:40
  5. Did you know the google hashmap
    By thometal in forum Forum Archive
    Replies: 0
    Last Post: 4th February 2007, 16:21

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •