Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 66

Thread: In-memory benchmark with fastest LZSS (QuickLZ, Snappy) compressors

  1. #31
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Cyan View Post
    mmh, i need to get access to my home computer to repackage the binary with the required DLL.
    Alternatively, there might be a way to statically link those libs into the binary. I'll try to google that.
    You probably want -static-libstdc++.

    /* Steinar */

  2. #32
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Sorry Cyan, I just discovered I left a power brick in the family house and I need to do some work with the netbook, so I don't want to waste energy, it has to wait until Saturday-Sunday.

  3. #33
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    Attached are sources and Win32 executable of BENCHMARK 0.3. Changes: added zlib, ucl_nrv, LZ4 rev 10.

    New results using 1 core of Athlon X4 2.8 GHz, Windows 7 (32-bit MinGW compilation under gcc 4.5.2) and 3 iterations. The input file (100 MB) is a concatenation of 10 different files, about 10 MB each: bmp, dct_coeffs, english_dic, ENWIK, exe, fp_log, hlp, XML, pdf, ncb.

    Code:
    memcpy              = 53 ms (1932 MB/s), 104854004->104854004
    fastlz 0.1 -1       = 494 ms (207 MB/s), 104854004->45614322, 233 ms (439 MB/s)
    fastlz 0.1 -2       = 518 ms (197 MB/s), 104854004->43986331, 218 ms (469 MB/s)
    lz4 rev 9           = 562 ms (182 MB/s), 104854004->44774336, 140 ms (731 MB/s)
    lz4 rev 10          = 330 ms (310 MB/s), 104854004->45520068, 134 ms (764 MB/s)
    lzf 3.6 vf          = 538 ms (190 MB/s), 104854004->44890314, 206 ms (497 MB/s)
    lzf 3.6 uf          = 506 ms (202 MB/s), 104854004->47089435, 207 ms (494 MB/s)
    lzham alpha6 -m0d26 = 28538 ms (3 MB/s), 104854004->25810349, 1287 ms (79 MB/s)
    lzjb 2010           = 636 ms (161 MB/s), 104854004->52693883, 303 ms (337 MB/s)
    lzmat 1.1           = 5050 ms (20 MB/s), 104854004->34419889, 375 ms (273 MB/s)
    lzo 2.05 1b_1       = 835 ms (122 MB/s), 104854004->43344892, 192 ms (533 MB/s)
    lzo 2.05 1b_9       = 1419 ms (72 MB/s), 104854004->39903850, 197 ms (519 MB/s)
    lzo 2.05 1b_99      = 1804 ms (56 MB/s), 104854004->38668219, 190 ms (538 MB/s)
    lzo 2.05 1c_1       = 808 ms (126 MB/s), 104854004->44096833, 197 ms (519 MB/s)
    lzo 2.05 1c_9       = 1442 ms (71 MB/s), 104854004->40537792, 207 ms (494 MB/s)
    lzo 2.05 1c_99      = 1810 ms (56 MB/s), 104854004->39492450, 204 ms (501 MB/s)
    lzo 2.05 1f_1       = 851 ms (120 MB/s), 104854004->44148573, 198 ms (517 MB/s)
    lzo 2.05 1x_1       = 246 ms (416 MB/s), 104854004->51883722, 192 ms (533 MB/s)
    lzo 2.05 1y_1       = 244 ms (419 MB/s), 104854004->51797619, 188 ms (544 MB/s)
    lzo 2.05 1b_999     = 15653 ms (6 MB/s), 104854004->35342929, 173 ms (591 MB/s)
    lzo 2.05 1c_999     = 11824 ms (8 MB/s), 104854004->36695814, 190 ms (538 MB/s)
    lzo 2.05 1f_999     = 13560 ms (7 MB/s), 104854004->36641050, 190 ms (538 MB/s)
    lzo 2.05 1x_999     = 31334 ms (3 MB/s), 104854004->34535021, 207 ms (494 MB/s)
    lzo 2.05 1y_999     = 29581 ms (3 MB/s), 104854004->34204260, 206 ms (497 MB/s)
    lzo 2.05 1z_999     = 31428 ms (3 MB/s), 104854004->34224781, 223 ms (459 MB/s)
    lzo 2.05 2a_999     = 12228 ms (8 MB/s), 104854004->37624779, 267 ms (383 MB/s)
    lzrw1               = 592 ms (172 MB/s), 104854004->51296084, 274 ms (373 MB/s)
    lzrw1-a             = 582 ms (175 MB/s), 104854004->50630870, 286 ms (358 MB/s)
    lzrw2               = 531 ms (192 MB/s), 104854004->47950899, 391 ms (261 MB/s)
    lzrw3               = 514 ms (199 MB/s), 104854004->46384103, 431 ms (237 MB/s)
    lzrw3-a             = 1417 ms (72 MB/s), 104854004->42580878, 463 ms (221 MB/s)
    snappy 1.0.3        = 307 ms (333 MB/s), 104854004->46155676, 141 ms (726 MB/s)
    tornado 0.666 16k/1 = 528 ms (193 MB/s), 104854004->47432525, 354 ms (289 MB/s)
    tornado 128k/2m     = 695 ms (147 MB/s), 104854004->45166082, 368 ms (278 MB/s)
    tornado 128k/8m     = 693 ms (147 MB/s), 104854004->42299345, 355 ms (288 MB/s)
    tornado 4m/8m       = 1675 ms (61 MB/s), 104854004->38140549, 405 ms (252 MB/s)
    tornado b128k/8m    = 793 ms (129 MB/s), 104854004->37629695, 481 ms (212 MB/s)
    tornado b4m/8m      = 1770 ms (57 MB/s), 104854004->33769518, 526 ms (194 MB/s)
    tornado b4m/32m     = 1513 ms (67 MB/s), 104854004->29325608, 501 ms (204 MB/s)
    quicklz 1.5.0 -3    = 3985 ms (25 MB/s), 104854004->37633177, 146 ms (701 MB/s)
    quicklz 1.5.0 -2    = 1013 ms (101 MB/s), 104854004->38965498, 351 ms (291 MB/s)
    quicklz 1.5.0 -1    = 373 ms (274 MB/s), 104854004->42816655, 300 ms (341 MB/s)
    quicklz 1.5.1 b5 -1 = 366 ms (279 MB/s), 104854004->42816655, 298 ms (343 MB/s)
    ucl_nrv2b 1.03 -1   = 7569 ms (13 MB/s), 104854004->37105362, 515 ms (198 MB/s)
    ucl_nrv2b 1.03 -6   = 12809 ms (7 MB/s), 104854004->34133511, 458 ms (223 MB/s)
    ucl_nrv2d 1.03 -1   = 7696 ms (13 MB/s), 104854004->36944802, 500 ms (204 MB/s)
    ucl_nrv2d 1.03 -6   = 12618 ms (8 MB/s), 104854004->34061261, 450 ms (227 MB/s)
    ucl_nrv2e 1.03 -1   = 7628 ms (13 MB/s), 104854004->36805095, 496 ms (206 MB/s)
    ucl_nrv2e 1.03 -6   = 12642 ms (8 MB/s), 104854004->33836500, 439 ms (233 MB/s)
    zlib 1.2.5 -1       = 2187 ms (46 MB/s), 104854004->35167222, 514 ms (199 MB/s)
    zlib 1.2.5 -6       = 5799 ms (17 MB/s), 104854004->31262824, 473 ms (216 MB/s)
    zlib 1.2.5 -9       = 19415 ms (5 MB/s), 104854004->31051160, 469 ms (218 MB/s)
    all                 = 950597 ms
    The results using 1 core of Intel Xeon X5355 @ 2.66GHz (64-bit Linux compilation under gcc 4.4.4) with the same options and input:
    Code:
    memcpy              = 64 ms (1599 MB/s), 104854004->104854004
    fastlz 0.1 -1       = 524 ms (195 MB/s), 104854004->45614322, 262 ms (390 MB/s)
    fastlz 0.1 -2       = 537 ms (190 MB/s), 104854004->43986331, 251 ms (407 MB/s)
    lz4 rev 9           = 408 ms (250 MB/s), 104854004->44774336, 167 ms (613 MB/s)
    lz4 rev 10          = 366 ms (279 MB/s), 104854004->45520068, 162 ms (632 MB/s)
    lzf 3.6 vf          = 483 ms (212 MB/s), 104854004->44890314, 206 ms (497 MB/s)
    lzf 3.6 uf          = 478 ms (214 MB/s), 104854004->47089435, 210 ms (487 MB/s)
    lzham alpha6 -m0d26 = 24385 ms (4 MB/s), 104854004->25810349, 878 ms (116 MB/s)
    lzjb 2010           = 620 ms (165 MB/s), 104854004->52693883, 303 ms (337 MB/s)
    lzmat 1.1           = 5689 ms (17 MB/s), 104854004->34419889, 361 ms (283 MB/s)
    lzo 2.05 1b_1       = 774 ms (132 MB/s), 104854004->43344892, 214 ms (478 MB/s)
    lzo 2.05 1b_9       = 1344 ms (76 MB/s), 104854004->39903850, 214 ms (478 MB/s)
    lzo 2.05 1b_99      = 1767 ms (57 MB/s), 104854004->38668219, 209 ms (489 MB/s)
    lzo 2.05 1c_1       = 734 ms (139 MB/s), 104854004->44096833, 217 ms (471 MB/s)
    lzo 2.05 1c_9       = 1490 ms (68 MB/s), 104854004->40537792, 223 ms (459 MB/s)
    lzo 2.05 1c_99      = 1795 ms (57 MB/s), 104854004->39492450, 218 ms (469 MB/s)
    lzo 2.05 1f_1       = 795 ms (128 MB/s), 104854004->44148573, 223 ms (459 MB/s)
    lzo 2.05 1x_1       = 272 ms (376 MB/s), 104854004->51881393, 195 ms (525 MB/s)
    lzo 2.05 1y_1       = 270 ms (379 MB/s), 104854004->51795089, 197 ms (519 MB/s)
    lzo 2.05 1b_999     = 12071 ms (8 MB/s), 104854004->35342929, 189 ms (541 MB/s)
    lzo 2.05 1c_999     = 9560 ms (10 MB/s), 104854004->36695814, 202 ms (506 MB/s)
    lzo 2.05 1f_999     = 10897 ms (9 MB/s), 104854004->36641050, 212 ms (483 MB/s)
    lzo 2.05 1x_999     = 25986 ms (3 MB/s), 104854004->34535021, 204 ms (501 MB/s)
    lzo 2.05 1y_999     = 24712 ms (4 MB/s), 104854004->34204260, 210 ms (487 MB/s)
    lzo 2.05 1z_999     = 26061 ms (3 MB/s), 104854004->34224781, 210 ms (487 MB/s)
    lzo 2.05 2a_999     = 9532 ms (10 MB/s), 104854004->37624779, 272 ms (376 MB/s)
    lzrw1               = 596 ms (171 MB/s), 104854004->51296084, 327 ms (313 MB/s)
    lzrw1-a             = 595 ms (172 MB/s), 104854004->50630870, 288 ms (355 MB/s)
    lzrw2               = 510 ms (200 MB/s), 104854004->47950899, 311 ms (329 MB/s)
    lzrw3               = 537 ms (190 MB/s), 104854004->46384103, 359 ms (285 MB/s)
    lzrw3-a             = 1449 ms (70 MB/s), 104854004->42580878, 363 ms (282 MB/s)
    snappy 1.0.3        = 294 ms (348 MB/s), 104854004->46155676, 149 ms (687 MB/s)
    tornado 0.666 16k/1 = 539 ms (189 MB/s), 104854004->47432525, 376 ms (272 MB/s)
    tornado 128k/2m     = 597 ms (171 MB/s), 104854004->45166082, 383 ms (267 MB/s)
    tornado 128k/8m     = 602 ms (170 MB/s), 104854004->42299345, 377 ms (271 MB/s)
    tornado 4m/8m       = 1140 ms (89 MB/s), 104854004->38140549, 397 ms (257 MB/s)
    tornado b128k/8m    = 653 ms (156 MB/s), 104854004->37629695, 423 ms (242 MB/s)
    tornado b4m/8m      = 1223 ms (83 MB/s), 104854004->33769518, 434 ms (235 MB/s)
    tornado b4m/32m     = 1101 ms (93 MB/s), 104854004->29325608, 423 ms (242 MB/s)
    quicklz 1.5.0 -3    = 3281 ms (31 MB/s), 104854004->37633177, 201 ms (509 MB/s)
    quicklz 1.5.0 -2    = 896 ms (114 MB/s), 104854004->38965498, 434 ms (235 MB/s)
    quicklz 1.5.0 -1    = 369 ms (277 MB/s), 104854004->42816655, 385 ms (265 MB/s)
    quicklz 1.5.1 b5 -1 = 332 ms (308 MB/s), 104854004->42816655, 381 ms (268 MB/s)
    ucl_nrv2b 1.03 -1   = 4413 ms (23 MB/s), 104854004->37105362, 576 ms (177 MB/s)
    ucl_nrv2b 1.03 -6   = 8697 ms (11 MB/s), 104854004->34133511, 518 ms (197 MB/s)
    ucl_nrv2d 1.03 -1   = 4378 ms (23 MB/s), 104854004->36944802, 564 ms (181 MB/s)
    ucl_nrv2d 1.03 -6   = 8569 ms (11 MB/s), 104854004->34061261, 507 ms (201 MB/s)
    ucl_nrv2e 1.03 -1   = 4466 ms (22 MB/s), 104854004->36805095, 562 ms (182 MB/s)
    ucl_nrv2e 1.03 -6   = 8703 ms (11 MB/s), 104854004->33836500, 504 ms (203 MB/s)
    zlib 1.2.5 -1       = 2178 ms (47 MB/s), 104854004->35167222, 489 ms (209 MB/s)
    zlib 1.2.5 -6       = 5808 ms (17 MB/s), 104854004->31262824, 445 ms (230 MB/s)
    zlib 1.2.5 -9       = 17902 ms (5 MB/s), 104854004->31051160, 439 ms (233 MB/s)
    all                 = 786724 ms
    Attached Files Attached Files
    Last edited by inikep; 6th June 2011 at 19:09.

  4. #34
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts
    You probably want -static-libstdc++.
    Yes, exactly, thanks Steinar

    New results using 1 core of Athlon X4 2.8 GHz, Windows 7 (32-bit) & using 1 core of Intel Xeon X5355 @ 2.66GHz (64-bit Linux)
    Thanks for the tests Przemyslaw

    I first believed that Athlon results were better than expected, while Intel ones were worse,
    but then i noticed that Intel results are using 64 bits Linux,
    which means pointers are 8 bytes long, doubling table sizes compared to 32 bits.

    Maybe i should use a different method to reference matches to avoid this doubling effect on 64 bits systems...
    Last edited by Cyan; 7th June 2011 at 02:14.

  5. #35
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts

    Compiler-dependent optimizations

    Hi

    I'm currently having another look at LZ4 source code, tweaking small details here and there.
    It's sometimes amazing how little things can make a difference from a performance perspective.

    However, these days, i'm findings some subtle line of code differences which sometimes would work faster on GCC, but slower on Visual Studio. And of course the other way round.

    Differences are large :
    current version of LZ4 (r10) performs approximately 20% faster when compiled with Visual Studio compared with GCC. That made me believe that VS was inherently superior to GCC. Now, my r11 candidate (not yet published) performs 10% better on GCC than on VS...

    So OK, the first lesson is that, when tuning a code for extreme performance, we cannot avoid such subtle consequences.

    Now, i'm wondering if it is possible to maintain both code branches.
    A potential way would be to use some kind of #ifdef which would detect if the current compiler is GCC, VS, or any other one. Then, depending on the detected compiler, the line of code would be tuned for better performance.

    So the question is : does such kind of #define to detect compiler exist ?


    I admit being slightly afraid of the code complexity using this method, and associated difficulty to maintain code. A cheaper way could be to tune the code for a single compiler and release just that, keeping the other branch in another file.

  6. #36
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Cyan View Post
    I first believed that Athlon results were better than expected, while Intel ones were worse,
    but then i noticed that Intel results are using 64 bits Linux,
    which means pointers are 8 bytes long, doubling table sizes compared to 32 bits.

    Maybe i should use a different method to reference matches to avoid this doubling effect on 64 bits systems...
    We found this in Snappy as well; if your hash table fits inside the L1 cache, you get a huge speed boost in this kind of compression. We simply store offsets instead of pointers, which means we can get away with 16-bit entries instead of 64-bit.

    Performance will always vary between settings and platforms, though; on my machine/compiler here, Snappy is generally 30%+ faster than LZ4 on compression and about twice as fast in decompression (even with LZ4 in its fastest, non-safe decompression mode), but the results from others' machines in this thread seem to have them roughly on par.

    /* Steinar */

  7. #37
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Quote Originally Posted by Cyan View Post
    Hi

    I'm currently having another look at LZ4 source code, tweaking small details here and there.
    It's sometimes amazing how little things can make a difference from a performance perspective.

    However, these days, i'm findings some subtle line of code differences which sometimes would work faster on GCC, but slower on Visual Studio. And of course the other way round.

    Differences are large :
    current version of LZ4 (r10) performs approximately 20% faster when compiled with Visual Studio compared with GCC. That made me believe that VS was inherently superior to GCC. Now, my r11 candidate (not yet published) performs 10% better on GCC than on VS...

    So OK, the first lesson is that, when tuning a code for extreme performance, we cannot avoid such subtle consequences.

    Now, i'm wondering if it is possible to maintain both code branches.
    A potential way would be to use some kind of #ifdef which would detect if the current compiler is GCC, VS, or any other one. Then, depending on the detected compiler, the line of code would be tuned for better performance.

    So the question is : does such kind of #define to detect compiler exist ?


    I admit being slightly afraid of the code complexity using this method, and associated difficulty to maintain code. A cheaper way could be to tune the code for a single compiler and release just that, keeping the other branch in another file.
    MSVC_VER for VC, __GNUC__ for gcc, __clang__ for Clang

  8. #38
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts
    We simply store offsets instead of pointers, which means we can get away with 16-bit entries instead of 64-bit.
    Yes, i also tried that, and although it works at keeping the table size stable for 64-bits systems, it nonetheless proved a slower alternative on 32-bits systems.

    Here also, one solution would be to have 2 code paths, one for 32 bits, with pointers, and one with 64 bits, with offsets.
    But i have not found any generic way to detect that with #define (sizeof cannot be used with #define).
    Except __WORDSIZE__, but this one increases dependencies with external codes.

    Update : OK, it seems __X86_64__ is the right test.
    However, even in 64 bits mode, the extra operation necessary to work with offsets (which are merely additions) costs just as much as the gains from dividing by 2 memory requirements.
    It ends up being almost exactly equivalent.
    Quite a surprise, i was expecting additions to be almost free...


    MSVC_VER for VC, __GNUC__ for gcc, __clang__ for Clang
    Many thanks m^2
    Last edited by Cyan; 9th June 2011 at 10:01.

  9. #39
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    Quote Originally Posted by m^2 View Post
    MSVC_VER for VC
    For VC++ it should be _MSC_VER (http://msdn.microsoft.com/en-us/library/b0084kay.aspx)

  10. #40
    Member
    Join Date
    Jun 2009
    Location
    KrakГіw, Poland
    Posts
    1,163
    Thanks
    14
    Thanked 43 Times in 35 Posts
    There are also __SUNPRO_CC and __SUNPRO_C #defines for SunStudio compilers.

  11. #41
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Quote Originally Posted by inikep View Post
    Confirmed, thanks for correction.

  12. #42
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts
    Thanks for instructions. For the time being, I've selected the easy road to provide the GCC optimized source for LZ4 r11 (available at google code).

    Attached is a slightly modified Przemyslaw's benchmark package (r11 just replaces r10).
    A win32 binary is also available (statically compiled, avoiding DLL dependencies such as reported by m^2).

    The results are, well, surprising. I was certainly not expecting such gains by just swapping a few lines of codes and instructions.
    As stated earlier, these changes do not provide large benefits for Visual Studio. But for GCC, it's a big leap forward, sometimes up to +50%. This is a good example of compiler - specific optimizations.
    With this version, GCC builds end up being faster than VS ones.

    As usual, i'm interested in feedbacks, especially on low-end hardware, where L2 cache is scarce and L1 strategy *should* bring the most benefits.

    Regards
    Attached Files Attached Files
    Last edited by Cyan; 11th June 2011 at 00:20.

  13. #43
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Code:
    m@m-Nokia-Booklet-3G:~/Downloads/benchmark03$ wine skibinshmark.exe -i3 -s ../scc.tar
    benchmark 0.3 (c) Dell Inc.  Written by P.Skibinski
    memcpy            = 323 ms (640 MB/s), 211957760->211957760
    fastlz 0.1 -1     = 3052 ms (67 MB/s), 211957760->104628335, 1967 ms (105 MB/s)
    fastlz 0.1 -2     = 3412 ms (60 MB/s), 211957760->100906274, 1476 ms (140 MB/s)
    lz4 rev 9         = 4897 ms (42 MB/s), 211957760->98040199, 1030 ms (200 MB/s)
    lz4 rev 11        = 2470 ms (83 MB/s), 211957760->101818033, 680 ms (304 MB/s)
    snappy 1.0.3      = 2858 ms (72 MB/s), 211957760->104755703, 999 ms (207 MB/s)
    lzf 3.6 vf        = 3809 ms (54 MB/s), 211957760->102041451, 1260 ms (164 MB/s)
    lzf 3.6 uf        = 3311 ms (62 MB/s), 211957760->105682277, 1263 ms (163 MB/s)
    lzham alpha6 -m0d26 = 310263 ms (0 MB/s), 211957760->64043081, 16984 ms (12 MB/s)
    lzjb 2010         = 5422 ms (38 MB/s), 211957760->122672025, 1705 ms (121 MB/s)
    lzmat 1.1         = 42571 ms (4 MB/s), 211957760->76486345, 2614 ms (79 MB/s)
    lzo 2.05 1b_1     = 4918 ms (42 MB/s), 211957760->97035718, 1284 ms (161 MB/s)
    lzo 2.05 1b_9     = 9871 ms (20 MB/s), 211957760->89264755, 1257 ms (164 MB/s)
    lzo 2.05 1b_99    = 13132 ms (15 MB/s), 211957760->85656302, 1241 ms (166 MB/s)
    lzo 2.05 1c_1     = 4627 ms (44 MB/s), 211957760->99551843, 1296 ms (159 MB/s)
    lzo 2.05 1c_9     = 9711 ms (21 MB/s), 211957760->91037796, 1299 ms (159 MB/s)
    lzo 2.05 1c_99    = 11985 ms (17 MB/s), 211957760->88118081, 1285 ms (161 MB/s)
    lzo 2.05 1f_1     = 4924 ms (42 MB/s), 211957760->99742137, 1235 ms (167 MB/s)
    lzo 2.05 1x_1     = 3812 ms (54 MB/s), 211957760->208714918, 417 ms (496 MB/s)
    lzo 2.05 1y_1     = 764 ms (270 MB/s), 211957760->208714002, 408 ms (507 MB/s)
    lzo 2.05 1b_999   = 136225 ms (1 MB/s), 211957760->76594616, 1143 ms (181 MB/s)
    lzo 2.05 1c_999   = 68723 ms (3 MB/s), 211957760->80397019, 1218 ms (169 MB/s)
    lzo 2.05 1f_999   = 75349 ms (2 MB/s), 211957760->80890513, 1202 ms (172 MB/s)
    lzo 2.05 1x_999   = 202208 ms (1 MB/s), 211957760->75302211, 1327 ms (155 MB/s)
    lzo 2.05 1y_999   = 198447 ms (1 MB/s), 211957760->75504114, 1321 ms (156 MB/s)
    lzo 2.05 1z_999   = 203228 ms (1 MB/s), 211957760->75061639, 1378 ms (150 MB/s)
    lzo 2.05 2a_999   = 62470 ms (3 MB/s), 211957760->82809608, 1638 ms (126 MB/s)
    lzrw1             = 4168 ms (49 MB/s), 211957760->113763206, 1601 ms (129 MB/s)
    lzrw1-a           = 3010 ms (68 MB/s), 211957760->112345946, 1724 ms (120 MB/s)
    lzrw2             = 3314 ms (62 MB/s), 211957760->105430138, 2138 ms (96 MB/s)
    lzrw3             = 3208 ms (64 MB/s), 211957760->100136247, 2517 ms (82 MB/s)
    lzrw3-a           = 10275 ms (20 MB/s), 211957760->90810520, 2507 ms (82 MB/s)
    snappy 1.0.3      = 2838 ms (72 MB/s), 211957760->104755703, 991 ms (208 MB/s)
    tornado 0.666 16k/1 = 3693 ms (56 MB/s), 211957760->107391267, 2026 ms (102 MB/s)
    tornado 128k/2m   = 4651 ms (44 MB/s), 211957760->98475697, 2163 ms (95 MB/s)
    tornado 128k/8m   = 4885 ms (42 MB/s), 211957760->98082873, 2204 ms (93 MB/s)
    tornado 4m/8m     = 10413 ms (19 MB/s), 211957760->96103939, 2770 ms (74 MB/s)
    tornado b128k/8m  = 6048 ms (34 MB/s), 211957760->88062062, 3126 ms (66 MB/s)
    tornado b4m/8m    = 11280 ms (18 MB/s), 211957760->85838678, 3692 ms (56 MB/s)
    tornado b4m/32m   = 11666 ms (17 MB/s), 211957760->86000929, 3815 ms (54 MB/s)
    quicklz 1.5.0 -3  = 31249 ms (6 MB/s), 211957760->81822726, 1043 ms (198 MB/s)
    quicklz 1.5.0 -2  = 6541 ms (31 MB/s), 211957760->84554401, 2417 ms (85 MB/s)
    quicklz 1.5.0 -1  = 2440 ms (84 MB/s), 211957760->94724661, 1870 ms (110 MB/s)
    quicklz 1.5.1 b5 -1 = 2407 ms (85 MB/s), 211957760->94724661, 1868 ms (110 MB/s)
    ucl_nrv2b 1.03 -1 = 67020 ms (3 MB/s), 211957760->81703313, 3569 ms (57 MB/s)
    ucl_nrv2b 1.03 -6 = 124227 ms (1 MB/s), 211957760->73902408, 3188 ms (64 MB/s)
    ucl_nrv2d 1.03 -1 = 67403 ms (3 MB/s), 211957760->81462132, 3491 ms (59 MB/s)
    ucl_nrv2d 1.03 -6 = 123217 ms (1 MB/s), 211957760->73757875, 3149 ms (65 MB/s)
    ucl_nrv2e 1.03 -1 = 67393 ms (3 MB/s), 211957760->81195723, 3675 ms (56 MB/s)
    ucl_nrv2e 1.03 -6 = 123817 ms (1 MB/s), 211957760->73302210, 3237 ms (63 MB/s)
    zlib 1.2.5 -1     = 15701 ms (13 MB/s), 211957760->77256596, 3080 ms (67 MB/s)
    zlib 1.2.5 -6     = 44342 ms (4 MB/s), 211957760->68230775, 2789 ms (74 MB/s)
    zlib 1.2.5 -9     = 103619 ms (1 MB/s), 211957760->67647497, 2778 ms (74 MB/s)
    all               = 7150422 ms (0 MB/s), 0->0
    done... (3 iterations)
    I tested the Windows binary made by Cyan.

    ADDED:
    Decompression frontiers:
    Code:
    lzo 2.05 1y_1     = 764 ms (270 MB/s), 211957760->208714002, 408 ms (507 MB/s)
    lz4 rev 11        = 2470 ms (83 MB/s), 211957760->101818033, 680 ms (304 MB/s)
    snappy 1.0.3      = 2838 ms (72 MB/s), 211957760->104755703, 991 ms (208 MB/s)
    lz4 rev 9         = 4897 ms (42 MB/s), 211957760->98040199, 1030 ms (200 MB/s)
    quicklz 1.5.0 -3  = 31249 ms (6 MB/s), 211957760->81822726, 1043 ms (198 MB/s)
    lzo 2.05 1b_999   = 136225 ms (1 MB/s), 211957760->76594616, 1143 ms (181 MB/s)
    lzo 2.05 1y_999   = 198447 ms (1 MB/s), 211957760->75504114, 1321 ms (156 MB/s)
    lzo 2.05 1x_999   = 202208 ms (1 MB/s), 211957760->75302211, 1327 ms (155 MB/s)
    lzo 2.05 1z_999   = 203228 ms (1 MB/s), 211957760->75061639, 1378 ms (150 MB/s)
    zlib 1.2.5 -9     = 103619 ms (1 MB/s), 211957760->67647497, 2778 ms (74 MB/s)
    lzham alpha6 -m0d26 = 310263 ms (0 MB/s), 211957760->64043081, 16984 ms (12 MB/s)
    ADDED:
    Compression frontiers
    Code:
    lzo 2.05 1y_1     = 764 ms (270 MB/s), 211957760->208714002, 408 ms (507 MB/s)
    quicklz 1.5.1 b5 -1 = 2407 ms (85 MB/s), 211957760->94724661, 1868 ms (110 MB/s)
    tornado b128k/8m  = 6048 ms (34 MB/s), 211957760->88062062, 3126 ms (66 MB/s)
    quicklz 1.5.0 -2  = 6541 ms (31 MB/s), 211957760->84554401, 2417 ms (85 MB/s)
    zlib 1.2.5 -1     = 15701 ms (13 MB/s), 211957760->77256596, 3080 ms (67 MB/s)
    lzmat 1.1         = 42571 ms (4 MB/s), 211957760->76486345, 2614 ms (79 MB/s)
    zlib 1.2.5 -6     = 44342 ms (4 MB/s), 211957760->68230775, 2789 ms (74 MB/s)
    zlib 1.2.5 -9     = 103619 ms (1 MB/s), 211957760->67647497, 2778 ms (74 MB/s)
    lzham alpha6 -m0d26 = 310263 ms (0 MB/s), 211957760->64043081, 16984 ms (12 MB/s)
    Last edited by m^2; 12th June 2011 at 14:44.

  14. #44
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    It seems that snappy is tested twice.

  15. #45
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts
    Thanks for testings m^2.
    It seems your platform benefits especially well from the new parameters, given the great boost that is observed.

    It seems that snappy is tested twice.
    Probably a copy/paste error on my side.
    Since source code is provided, anyone can remove the second mention and compile a new binary.
    It should not change any conclusion.

    btw, i'm somewhat surprised by lzo 2.05 1x_1 and lzo 2.05 1y_1 results.
    They are both very fast but, well, do they compress anything....
    Last edited by Cyan; 13th June 2011 at 21:55.

  16. #46
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    Quote Originally Posted by Cyan View Post
    btw, i'm somewhat surprised by lzo 2.05 1x_1 and lzo 2.05 1y_1 results.
    They are both very fast but, well, do they compress anything....
    It depends on an input data. It works with my 100 MB concatenation of 10 different files, about 10 MB each: bmp, dct_coeffs, english_dic, ENWIK, exe, fp_log, hlp, XML, pdf, ncb:
    lzo 2.05 1x_1 = 246 ms (416 MB/s), 104854004->51883722, 192 ms (533 MB/s)
    lzo 2.05 1y_1 = 244 ms (419 MB/s), 104854004->51797619, 188 ms (544 MB/s)

  17. #47
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    618
    Thanks
    56
    Thanked 31 Times in 16 Posts
    I initially though that you meant these versions are trained to take advantage of some specific input files, but your sample file is varied enough, with many file types.

    It's strange to see such large difference.
    Could the compression ratio be dependent of some kind of compiler or target specific parameters ?

  18. #48
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    It can be caused by a heuristic that tries to find incompressible data. If compression is low it switches to copy instead of compress. You can find such heuristic in e.g. QuickLZ.

  19. #49
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    I found a mention of LZPS in the code...what is it?

  20. #50
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    Quote Originally Posted by m^2 View Post
    I found a mention of LZPS in the code...what is it?
    It's a Dell's proprietary compressor.

  21. #51
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    OK, thanks for the info.
    I guess you're not authorized to publish its benchmarks?

  22. #52
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    157
    Thanks
    4
    Thanked 14 Times in 3 Posts
    LZPS is optimized for decompression speed. The results using 1 core of Athlon X4 2.8 GHz, Windows 7 (32-bit MinGW compilation under gcc 4.5.2) and 3 iterations. The input file (100 MB) is a concatenation of 10 different files, about 10 MB each: bmp, dct_coeffs, english_dic, ENWIK, exe, fp_log, hlp, XML, pdf, ncb.

    Code:
    lzps_1M8            = 297 ms (344 MB/s), 104854004->55554014, 86 ms (1190 MB/s)
    lzps_5M8            = 298 ms (343 MB/s), 104854004->52980614, 90 ms (1137 MB/s)
    lzps_1M4            = 316 ms (324 MB/s), 104854004->46578650, 124 ms (825 MB/s)
    lzps_5M4            = 318 ms (322 MB/s), 104854004->44494718, 139 ms (736 MB/s)
    lzps_9M4            = 353 ms (290 MB/s), 104854004->43805694, 142 ms (721 MB/s)
    lzps2_H13           = 520 ms (196 MB/s), 104854004->43070109, 140 ms (731 MB/s)
    lzps2_H15           = 681 ms (150 MB/s), 104854004->40275648, 149 ms (687 MB/s)
    lzps2_H17           = 1268 ms (80 MB/s), 104854004->37682986, 164 ms (624 MB/s)
    lzps2_H19           = 1462 ms (70 MB/s), 104854004->32141051, 170 ms (602 MB/s)
    lzps2_H21           = 1545 ms (66 MB/s), 104854004->31693923, 175 ms (585 MB/s)
    lzps2F_H13          = 696 ms (147 MB/s), 104854004->41509287, 150 ms (682 MB/s)
    lzps2F_H15          = 861 ms (118 MB/s), 104854004->39234008, 153 ms (669 MB/s)
    lzps2F_H17          = 1491 ms (68 MB/s), 104854004->36715149, 161 ms (636 MB/s)
    lzps2F_H19          = 1924 ms (53 MB/s), 104854004->31568798, 168 ms (609 MB/s)
    lzps2F_H21          = 2279 ms (44 MB/s), 104854004->29970889, 169 ms (605 MB/s)
    snappy 1.0.3        = 306 ms (334 MB/s), 104854004->46155676, 141 ms (726 MB/s)
    Last edited by inikep; 12th August 2011 at 10:46.

  23. #53
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Nice, thanks.

  24. #54
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    I modfied the benchmark to serve as a testbed for filesystem compression. The scheme goes like this:
    There are 2 parameters, FS block size and disk sector size. Data is divided into chunks with the same size as block. Each chunk is compressed independently and if compressor manages to save at least one sector - its size is rounded up to sector boundaries. Otherwise the chunk is left uncompressed.
    This skews decompression speed tests - a weak compressor that hardly ever manages to save a full sector has much less data to decompress. Therefore I introduced another measure: Saved bytes per second, used for both compression and decompression. The idea is that if something is both stronger and saves more bytes/second than another compressor, you can achieve the same compression ratio with greater speed by skipping part of the data, therefore it's clearly superior.

    I performed 2 tests on my Pentium D@2.66 with Silesia Compression Corpus and gcc 4.5.2.

    First, 4k sector like on most new drives and 128k block (maximum available on ZFS):
    Code:
    E:\projects\benchmark03\src>skibinshmark.exe -b131072 -s4096 -i3 -a scc.tar
    benchmark 0.4 (c) Dell Inc.  Written by P.Skibinski
    memcpy            = 269 ms (769 MB/s), 211927552->211927552
    fastlz 0.1 -1     = 2077 ms (99 MB/s), 108339200 B, saved (48/98) MB/s
    fastlz 0.1 -2     = 2152 ms (96 MB/s), 105037824 B, saved (48/103) MB/s
    LZ4 r11           = 1863 ms (111 MB/s), 106631168 B, saved (55/166) MB/s
    snappy 1.0.3      = 1898 ms (109 MB/s), 108097536 B, saved (53/173) MB/s
    lzf 3.6 vf        = 2132 ms (97 MB/s), 105697280 B, saved (48/119) MB/s
    lzf 3.6 uf        = 2024 ms (102 MB/s), 109654016 B, saved (49/114) MB/s
    lzham alpha6 -m0d26 = 149734 ms (1 MB/s), 71901184 B, saved (0/9) MB/s
    lzjb 2010         = 2389 ms (86 MB/s), 125628416 B, saved (35/59) MB/s
    lzmat 1.1         = 19498 ms (10 MB/s), 82329600 B, saved (6/62) MB/s
    lzo 2.05 1b_1     = 3314 ms (62 MB/s), 101900288 B, saved (32/111) MB/s
    lzo 2.05 1b_9     = 5682 ms (36 MB/s), 94191616 B, saved (20/114) MB/s
    lzo 2.05 1b_99    = 8018 ms (25 MB/s), 90832896 B, saved (14/117) MB/s
    lzo 2.05 1c_1     = 3674 ms (56 MB/s), 103649280 B, saved (28/104) MB/s
    lzo 2.05 1c_9     = 6599 ms (31 MB/s), 95309824 B, saved (17/111) MB/s
    lzo 2.05 1c_99    = 8154 ms (25 MB/s), 92250112 B, saved (14/116) MB/s
    lzo 2.05 1f_1     = 3880 ms (53 MB/s), 103690240 B, saved (27/108) MB/s
    lzo 2.05 1x_1     = 1578 ms (131 MB/s), 104505344 B, saved (66/105) MB/s
    lzo 2.05 1y_1     = 1605 ms (128 MB/s), 105172992 B, saved (64/104) MB/s
    lzo 2.05 1b_999   = 50953 ms (4 MB/s), 82677760 B, saved (2/142) MB/s
    lzo 2.05 1c_999   = 38878 ms (5 MB/s), 84676608 B, saved (3/135) MB/s
    lzo 2.05 1f_999   = 42757 ms (4 MB/s), 85319680 B, saved (2/129) MB/s
    lzo 2.05 1x_999   = 96467 ms (2 MB/s), 80609280 B, saved (1/125) MB/s
    lzo 2.05 1y_999   = 94952 ms (2 MB/s), 81207296 B, saved (1/125) MB/s
    lzo 2.05 1z_999   = 96958 ms (2 MB/s), 80474112 B, saved (1/118) MB/s
    lzo 2.05 2a_999   = 35482 ms (5 MB/s), 86548480 B, saved (3/96) MB/s
    lzrw1             = 2850 ms (72 MB/s), 116891648 B, saved (32/69) MB/s
    lzrw1-a           = 2876 ms (71 MB/s), 115470336 B, saved (32/74) MB/s
    lzrw2             = 2548 ms (81 MB/s), 109129728 B, saved (39/65) MB/s
    lzrw3             = 2498 ms (82 MB/s), 104960000 B, saved (41/56) MB/s
    lzrw3-a           = 5838 ms (35 MB/s), 95911936 B, saved (19/59) MB/s
    tornado 0.666 1   = 2740 ms (75 MB/s), 109912064 B, saved (36/60) MB/s
    tornado 2         = 2511 ms (82 MB/s), 104558592 B, saved (41/65) MB/s
    tornado 3         = 2462 ms (84 MB/s), 104558592 B, saved (42/65) MB/s
    tornado 4         = 5893 ms (35 MB/s), 103985152 B, saved (17/64) MB/s
    tornado 5         = 2848 ms (72 MB/s), 95330304 B, saved (39/49) MB/s
    tornado 6         = 6285 ms (32 MB/s), 94154752 B, saved (18/47) MB/s
    tornado 7         = 6277 ms (32 MB/s), 94154752 B, saved (18/47) MB/s
    quicklz 1.5.0 -3  = 13055 ms (15 MB/s), 87388160 B, saved (9/136) MB/s
    quicklz 1.5.0 -2  = 3942 ms (52 MB/s), 90791936 B, saved (30/59) MB/s
    quicklz 1.5.0 -1  = 1796 ms (115 MB/s), 100081664 B, saved (60/58) MB/s
    quicklz 1.5.1 -1  = 1845 ms (112 MB/s), 100081664 B, saved (59/58) MB/s
    ucl_nrv2b 1.03 -1 = 27986 ms (7 MB/s), 86249472 B, saved (4/51) MB/s
    ucl_nrv2b 1.03 -6 = 50302 ms (4 MB/s), 78553088 B, saved (2/61) MB/s
    ucl_nrv2d 1.03 -1 = 27968 ms (7 MB/s), 85909504 B, saved (4/52) MB/s
    ucl_nrv2d 1.03 -6 = 49742 ms (4 MB/s), 78426112 B, saved (2/61) MB/s
    ucl_nrv2e 1.03 -1 = 27813 ms (7 MB/s), 85680128 B, saved (4/51) MB/s
    ucl_nrv2e 1.03 -6 = 49569 ms (4 MB/s), 78028800 B, saved (2/60) MB/s
    zlib 1.2.5 -1     = 9678 ms (21 MB/s), 81289216 B, saved (13/56) MB/s
    zlib 1.2.5 -6     = 26480 ms (7 MB/s), 72802304 B, saved (5/64) MB/s
    zlib 1.2.5 -9     = 59082 ms (3 MB/s), 72294400 B, saved (2/66) MB/s
    all               = 3495418 ms (0 MB/s), 0->0
    done... (3 iterations)
    Second, 4k sector like on most new drives and more general purpose 8k block:
    Code:
    E:\projects\benchmark03\src>skibinshmark.exe -b8192 -s4096 -i3 -a scc.tar
    benchmark 0.4 (c) Dell Inc.  Written by P.Skibinski
    memcpy            = 267 ms (775 MB/s), 211927552->211927552
    fastlz 0.1 -1     = 2067 ms (100 MB/s), 170000384 B, saved (19/121) MB/s
    fastlz 0.1 -2     = 2210 ms (93 MB/s), 169967616 B, saved (18/122) MB/s
    LZ4 r11           = 2262 ms (91 MB/s), 174100480 B, saved (16/205) MB/s
    snappy 1.0.3      = 1799 ms (115 MB/s), 171319296 B, saved (22/203) MB/s
    lzf 3.6 vf        = 2336 ms (88 MB/s), 168873984 B, saved (17/140) MB/s
    lzf 3.6 uf        = 2208 ms (93 MB/s), 171237376 B, saved (17/141) MB/s
    lzham alpha6 -m0d26 = 397889 ms (0 MB/s), 133795840 B, saved (0/1) MB/s
    lzjb 2010         = 2407 ms (85 MB/s), 175611904 B, saved (14/105) MB/s
    lzmat 1.1         = 9100 ms (22 MB/s), 143110144 B, saved (7/51) MB/s
    lzo 2.05 1b_1     = 2919 ms (70 MB/s), 170741760 B, saved (13/141) MB/s
    lzo 2.05 1b_9     = 5841 ms (35 MB/s), 156020736 B, saved (9/117) MB/s
    lzo 2.05 1b_99    = 7204 ms (28 MB/s), 151351296 B, saved (8/113) MB/s
    lzo 2.05 1c_1     = 2746 ms (75 MB/s), 169779200 B, saved (14/135) MB/s
    lzo 2.05 1c_9     = 5958 ms (34 MB/s), 153640960 B, saved (9/114) MB/s
    lzo 2.05 1c_99    = 6828 ms (30 MB/s), 149389312 B, saved (8/111) MB/s
    lzo 2.05 1f_1     = 3253 ms (63 MB/s), 168673280 B, saved (12/135) MB/s
    lzo 2.05 1x_1     = 1569 ms (131 MB/s), 169938944 B, saved (26/140) MB/s
    lzo 2.05 1y_1     = 1587 ms (130 MB/s), 170323968 B, saved (25/137) MB/s
    lzo 2.05 1b_999   = 20566 ms (10 MB/s), 145166336 B, saved (3/116) MB/s
    lzo 2.05 1c_999   = 21089 ms (9 MB/s), 144158720 B, saved (3/116) MB/s
    lzo 2.05 1f_999   = 23222 ms (8 MB/s), 141881344 B, saved (2/114) MB/s
    lzo 2.05 1x_999   = 44510 ms (4 MB/s), 141357056 B, saved (1/106) MB/s
    lzo 2.05 1y_999   = 44530 ms (4 MB/s), 141979648 B, saved (1/105) MB/s
    lzo 2.05 1z_999   = 45076 ms (4 MB/s), 141062144 B, saved (1/99) MB/s
    lzo 2.05 2a_999   = 26266 ms (7 MB/s), 140189696 B, saved (2/83) MB/s
    lzrw1             = 2865 ms (72 MB/s), 172421120 B, saved (13/108) MB/s
    lzrw1-a           = 2865 ms (72 MB/s), 171835392 B, saved (13/106) MB/s
    lzrw2             = 2605 ms (79 MB/s), 170516480 B, saved (15/86) MB/s
    lzrw3             = 2777 ms (74 MB/s), 170545152 B, saved (14/68) MB/s
    lzrw3-a           = 5783 ms (35 MB/s), 158240768 B, saved (9/59) MB/s
    tornado 0.666 1   = 3384 ms (61 MB/s), 173805568 B, saved (11/74) MB/s
    tornado 2         = 3525 ms (58 MB/s), 173342720 B, saved (10/93) MB/s
    tornado 3         = 3763 ms (54 MB/s), 173342720 B, saved (10/92) MB/s
    tornado 4         = 5099 ms (40 MB/s), 173326336 B, saved (7/92) MB/s
    tornado 5         = 4290 ms (48 MB/s), 166543360 B, saved (10/61) MB/s
    tornado 6         = 5718 ms (36 MB/s), 166416384 B, saved (7/60) MB/s
    tornado 7         = 5704 ms (36 MB/s), 166416384 B, saved (7/60) MB/s
    quicklz 1.5.0 -3  = 8946 ms (23 MB/s), 145526784 B, saved (7/121) MB/s
    quicklz 1.5.0 -2  = 4217 ms (49 MB/s), 157700096 B, saved (12/60) MB/s
    quicklz 1.5.0 -1  = 2202 ms (93 MB/s), 168370176 B, saved (19/78) MB/s
    quicklz 1.5.1 -1  = 2218 ms (93 MB/s), 168452096 B, saved (19/78) MB/s
    ucl_nrv2b 1.03 -1 = 42884 ms (4 MB/s), 141213696 B, saved (1/45) MB/s
    ucl_nrv2b 1.03 -6 = 51626 ms (4 MB/s), 136921088 B, saved (1/47) MB/s
    ucl_nrv2d 1.03 -1 = 44961 ms (4 MB/s), 140918784 B, saved (1/45) MB/s
    ucl_nrv2d 1.03 -6 = 52029 ms (3 MB/s), 137003008 B, saved (1/47) MB/s
    ucl_nrv2e 1.03 -1 = 44888 ms (4 MB/s), 140865536 B, saved (1/43) MB/s
    ucl_nrv2e 1.03 -6 = 52065 ms (3 MB/s), 136802304 B, saved (1/46) MB/s
    zlib 1.2.5 -1     = 10864 ms (19 MB/s), 132829184 B, saved (7/38) MB/s
    zlib 1.2.5 -6     = 19212 ms (10 MB/s), 130273280 B, saved (4/39) MB/s
    zlib 1.2.5 -9     = 25284 ms (8 MB/s), 130244608 B, saved (3/40) MB/s
    all               = 3544147 ms (0 MB/s), 0->0
    done... (3 iterations)
    Codecs that are Pareto frontiers in compression in at least one of tests:
    namesize1csMBps1dsMBps1size2csMBps2dsMBps2
    lzo 2.05 1x_1 1045053446610516993894426140
    quicklz 1.5.0 -1 10008166460 5816837017619 78
    tornado 5 9533030439 4916654336010 61
    quicklz 1.5.0 -2 9079193630 5915770009612 60
    lzo 2.05 1c_9 9530982417111153640960 9114
    lzo 2.05 1c_99 9225011214116149389312 8111
    zlib 1.2.5 -1 8128921613 56132829184 7 38
    zlib 1.2.5 -6 72802304 5 64130273280 4 39
    zlib 1.2.5 -9 72294400 2 66130244608 3 40
    lzham alpha6 -m0d26 71901184 0 9133795840 0 1
    And the same for decompression:
    namesize1csMBps1dsMBps1size2csMBps2dsMBps2
    snappy 1.0.3 1080975365317317131929622203
    LZ4 r11 1066311685516617410048016205
    lzo 2.05 1b_999 82677760 2142145166336 3116
    quicklz 1.5.0 -3 87388160 9136145526784 7121
    lzo 2.05 1c_999 84676608 3135144158720 3116
    lzo 2.05 1f_999 85319680 2129141881344 2114
    lzo 2.05 1x_999 80609280 1125141357056 1106
    lzo 2.05 1y_999 81207296 1125141979648 1105
    lzf 3.6 vf 1056972804811916887398417140
    lzo 2.05 1z_999 80474112 1118141062144 1 99
    lzo 2.05 1b_1 1019002883211117074176013141
    lzo 2.05 1f_1 1036902402710816867328012135
    lzo 2.05 2a_999 86548480 3 96140189696 2 83
    zlib 1.2.5 -9 72294400 2 66130244608 3 40
    ucl_nrv2b 1.03 -6 78553088 2 61136921088 1 47
    ucl_nrv2e 1.03 -6 78028800 2 60136802304 1 46
    lzham alpha6 -m0d26 71901184 0 9 133795840 0 1
    And all results in a single table:
    namesize1csMBps1dsMBps1size2csMBps2dsMBps2
    fastlz 0.1 -1 10833920048 9817000038419121
    fastlz 0.1 -2 1050378244810316996761618122
    LZ4 r11 1066311685516617410048016205
    snappy 1.0.3 1080975365317317131929622203
    lzf 3.6 vf 1056972804811916887398417140
    lzf 3.6 uf 1096540164911417123737617141
    lzham alpha6 -m0d26 71901184 0 9133795840 0 1
    lzjb 2010 12562841635 5917561190414105
    lzmat 1.1 82329600 6 62143110144 7 51
    lzo 2.05 1b_1 1019002883211117074176013141
    lzo 2.05 1b_9 9419161620114156020736 9117
    lzo 2.05 1b_99 9083289614117151351296 8113
    lzo 2.05 1c_1 1036492802810416977920014135
    lzo 2.05 1c_9 9530982417111153640960 9114
    lzo 2.05 1c_99 9225011214116149389312 8111
    lzo 2.05 1f_1 1036902402710816867328012135
    lzo 2.05 1x_1 1045053446610516993894426140
    lzo 2.05 1y_1 1051729926410417032396825137
    lzo 2.05 1b_999 82677760 2142145166336 3116
    lzo 2.05 1c_999 84676608 3135144158720 3116
    lzo 2.05 1f_999 85319680 2129141881344 2114
    lzo 2.05 1x_999 80609280 1125141357056 1106
    lzo 2.05 1y_999 81207296 1125141979648 1105
    lzo 2.05 1z_999 80474112 1118141062144 1 99
    lzo 2.05 2a_999 86548480 3 96140189696 2 83
    lzrw1 11689164832 6917242112013108
    lzrw1-a 11547033632 7417183539213106
    lzrw2 10912972839 6517051648015 86
    lzrw3 10496000041 5617054515214 68
    lzrw3-a 9591193619 59158240768 9 59
    tornado 0.666 1 10991206436 6017380556811 74
    tornado 2 10455859241 6517334272010 93
    tornado 3 10455859242 6517334272010 92
    tornado 4 10398515217 64173326336 7 92
    tornado 5 9533030439 4916654336010 61
    tornado 6 9415475218 47166416384 7 60
    tornado 7 9415475218 47166416384 7 60
    quicklz 1.5.0 -3 87388160 9136145526784 7121
    quicklz 1.5.0 -2 9079193630 5915770009612 60
    quicklz 1.5.0 -1 10008166460 5816837017619 78
    quicklz 1.5.1 -1 10008166459 5816845209619 78
    ucl_nrv2b 1.03 -1 86249472 4 51141213696 1 45
    ucl_nrv2b 1.03 -6 78553088 2 61136921088 1 47
    ucl_nrv2d 1.03 -1 85909504 4 52140918784 1 45
    ucl_nrv2d 1.03 -6 78426112 2 61137003008 1 47
    ucl_nrv2e 1.03 -1 85680128 4 51140865536 1 43
    ucl_nrv2e 1.03 -6 78028800 2 60136802304 1 46
    zlib 1.2.5 -1 8128921613 56132829184 7 38
    zlib 1.2.5 -6 72802304 5 64130273280 4 39
    zlib 1.2.5 -9 72294400 2 66130244608 3 40
    Attached Files Attached Files

  25. #55
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Charted selected compressors strength at different block sizes:

    Attached a spreadsheet with numbers.
    Attached Files Attached Files

  26. #56
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi,

    We've released Snappy 1.0.4; it has the decompression performance improvements I talked about in June, so if you're still measuring using 1.0.3, it should be a nice (albeit small) step up.

    The main other improvement is improved operating system portability (e.g., better support for Solaris, HP-UX and AIX).

    /* Steinar */

  27. #57
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Quote Originally Posted by Sesse View Post
    Hi,

    We've released Snappy 1.0.4; it has the decompression performance improvements I talked about in June, so if you're still measuring using 1.0.3, it should be a nice (albeit small) step up.

    The main other improvement is improved operating system portability (e.g., better support for Solaris, HP-UX and AIX).

    /* Steinar */
    Thanks for the info. I'm glad I didn't do a multithreaded comparison yet, even though the code is done, Yann adds tweaks one after another, now you did too, all benchmarking would be obsolete in 2 weeks.
    I sure will update them before the next round.

  28. #58
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    inikeep, I see that you list lzmat 1.1, while the official site has only 1.0.
    What's up?
    ADDED:
    Oh, well, I found the answer myself. What a mess. If one downloads the 1.0 package, the main file writes:
    Code:
    /*
    **  $Id: lzmat_enc.c,v 1.1 2008/07/08 16:58:35 Vitaly Exp $
    **  $Revision: 1.1 $
    **  $Date: 2008/07/08 16:58:35 $
    ** 
    **  $Author: Vitaly $
    **
    ***************************************************************************
    ** LZMAT ANSI-C encoder 1.01
    So it's 1.1, 1.0 or 1.01 depending on where you look.
    Last edited by m^2; 26th February 2012 at 00:11.

  29. #59
    Member
    Join Date
    Jun 2008
    Location
    L.E.
    Posts
    279
    Thanks
    15
    Thanked 8 Times in 6 Posts
    can some one test exdupe, current lz4(-hq) and snappy?

  30. #60
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    I'll gladly test exedupe as soon as it's FOSS.
    There's a current Snappy in FsBench and LZ4 is slightly out of date, but there were no changes that should have any impact on speed. Not sure about LZ4hc.
    But I think I'll update LZ4 soon because it's reasonable for users to expect the FsBench numbers to be no longer relevant.

    ADDED: You can always fork FsBench and add exedupe by yourself though.
    Last edited by m^2; 29th May 2012 at 22:36.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. LZSS v0.01 is here!
    By encode in forum Data Compression
    Replies: 67
    Last Post: 28th March 2012, 11:10
  2. Replies: 23
    Last Post: 17th September 2011, 13:12
  3. Google released Snappy compression/decompression library
    By Sportman in forum Data Compression
    Replies: 11
    Last Post: 16th May 2011, 13:31
  4. LZSS with a large dictionary
    By encode in forum Data Compression
    Replies: 31
    Last Post: 31st July 2008, 22:15
  5. Fastest Compressors
    By LovePimple in forum Forum Archive
    Replies: 0
    Last Post: 1st November 2006, 07:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •