Results 1 to 22 of 22

Thread: Density 0.12.0 beta

  1. #1
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts

    Density 0.12.0 beta

    Hello,

    Just a quick message to announce Density 0.12.0 beta (https://github.com/centaurean/density) which has been released today. Numerous improvements have been made, the main ones being :


    • encoding/decoding speeds for chameleon and cheetah
    • a new algorithm ("lion")


    Here is a quick benchmark against other popular compressors using m^2's fsbench (https://github.com/centaurean/fsbench-density), done on a Macbook Pro, OS X 10.10.2, 2.3 GHz Intel Core i7, 8Go 1600 MHz DDR, SSD :

    enwik8

    Code:
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    density::chameleon                      0.12.0 beta   
       61524474 (x 1.625)      903 MB/s 1248 MB/s       347e6  480e6
    density::cheetah                        0.12.0 beta   
       53156746 (x 1.881)      468 MB/s  482 MB/s       219e6  225e6
    density::lion                           0.12.0 beta   
       47991569 (x 2.084)      285 MB/s  271 MB/s       148e6  140e6
    LZ4                                     r127         
       56973103 (x 1.755)      258 MB/s 1613 MB/s       111e6  694e6
    LZF                                     3.6          very
       53945381 (x 1.854)      192 MB/s  370 MB/s        88e6  170e6
    LZO                                     2.08         1x1
       55792795 (x 1.792)      287 MB/s  371 MB/s       126e6  164e6
    QuickLZ                                 1.5.1b6      1
       52334371 (x 1.911)      281 MB/s  351 MB/s       134e6  167e6
    Snappy                                  1.1.0        
       56539845 (x 1.769)      244 MB/s  788 MB/s       106e6  342e6
    wfLZ                                    r10          
       63521804 (x 1.574)      150 MB/s  513 MB/s        54e6  187e6
    silesia

    Code:
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    density::chameleon                      0.12.0 beta   
      133118910 (x 1.592)     1040 MB/s 1281 MB/s       386e6  476e6
    density::cheetah                        0.12.0 beta   
      101751474 (x 2.083)      531 MB/s  493 MB/s       276e6  256e6
    density::lion                           0.12.0 beta   
       89433997 (x 2.370)      304 MB/s  275 MB/s       175e6  159e6
    LZ4                                     r127         
      101634462 (x 2.086)      365 MB/s 1815 MB/s       189e6  944e6
    LZF                                     3.6          very
      102043866 (x 2.077)      254 MB/s  500 MB/s       131e6  259e6
    LZO                                     2.08         1x1
      100592662 (x 2.107)      429 MB/s  578 MB/s       225e6  303e6
    QuickLZ                                 1.5.1b6      1
       94727961 (x 2.238)      370 MB/s  432 MB/s       204e6  238e6
    Snappy                                  1.1.0        
      101385885 (x 2.091)      356 MB/s 1085 MB/s       185e6  565e6
    wfLZ                                    r10          
      109610020 (x 1.934)      196 MB/s  701 MB/s        94e6  338e6
    More benchmarks are to come, notably from Squash (https://quixdb.github.io/squash-benchmark/) very soon.
    To test the library the easiest way is to clone and use sharc (https://github.com/centaurean/sharc) which is a command line utility for density.

    Thank you very much !

  2. The Following 2 Users Say Thank You to gpnuma For This Useful Post:

    inikep (24th March 2015),Nania Francesco (24th March 2015)

  3. #2
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Are there builds of sharc or fsbench-density? I need Win64.

    The code uses a lot of obscure C99 and is very MSVC-unfriendly. The builtins suggest it's clang/gcc specific.
    Last edited by cbloom; 3rd August 2016 at 20:33.

  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Could you maybe add ZSTD for comparison?
    https://github.com/Cyan4973/zstd

  5. #4
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    @cbloom You're right the code was developed using clang as the main compiler, however you can compile sharc easily on windows64 with http://sourceforge.net/projects/mingwbuilds/.
    The resulting binary performs very well on that platform as well.

    If you can't make it work I'll post a win64 binary here, just tell me.

  6. #5
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    @Jarek I included it in the list of codecs to compare in fsbench, alongside blosc, LZJB, RLE64, Shrinker ... but they don't come out in the results list (code is here : https://github.com/centaurean/fsbenc...src/codecs.cpp) and I haven't yet checked why.
    So here is a manual bench on my platform :

    Code:
    $ time zstd -f enwik8
    Compressed filename will be : enwik8.zst 
    Compressed 100000000 bytes into 40024854 bytes ==> 40.02%                      
    
    real    0m0.710s
    user    0m0.595s
    sys 0m0.099s
    
    $ time ./sharc -c3 -f enwik8
    Compressed enwik8 (100,000,000 bytes) to enwik8.sharc (47,991,605 bytes) ➔ 48.0% (User time 0.350s ➔ 286 MB/s)
    
    real    0m0.476s
    user    0m0.351s
    sys 0m0.098s
    
    $ time ./sharc -c2 -f enwik8
    Compressed enwik8 (100,000,000 bytes) to enwik8.sharc (53,156,782 bytes) ➔ 53.2% (User time 0.204s ➔ 490 MB/s)
    
    real    0m0.336s
    user    0m0.206s
    sys 0m0.100s
    
    Last edited by gpnuma; 25th March 2015 at 14:38.

  7. The Following User Says Thank You to gpnuma For This Useful Post:

    Jarek (25th March 2015)

  8. #6
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by cbloom View Post
    Are there builds of sharc or fsbench-density? I need Win64.

    The code uses a lot of obscure C99 and is very MSVC-unfriendly. The builtins suggest it's clang/gcc specific.
    I just released a binary for sharc 1.2.1 here : https://github.com/centaurean/sharc/...harc_win64.exe

  9. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quick and dirty test:
    Code:
    m% ./fsbench fast ~/bench/scc1.tar 
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    density::chameleon                      0.12.0 beta  
        8618714 (x 1.461)      682 MB/s  723 MB/s       215e6  228e6
    density::cheetah                        0.12.0 beta  
        7282526 (x 1.730)      337 MB/s  325 MB/s       142e6  136e6
    density::lion                           0.12.0 beta  
        6674599 (x 1.887)      153 MB/s  155 MB/s        72e6   72e6
    lrrle                                   0            256
       12281040 (x 1.026)     3063 MB/s 3093 MB/s        76e6   76e6
    LZ4                                     r127         
        7200430 (x 1.749)      247 MB/s 1244 MB/s       105e6  532e6
    LZF                                     3.6          very
        7090337 (x 1.776)      174 MB/s  517 MB/s        76e6  226e6
    LZO                                     2.08         1x1
        7152460 (x 1.761)      383 MB/s  599 MB/s       165e6  259e6
    QuickLZ                                 1.5.1b6      1
        6804291 (x 1.851)      335 MB/s  328 MB/s       154e6  150e6
    Snappy                                  1.1.0        
        7070230 (x 1.781)      322 MB/s  742 MB/s       141e6  325e6
    wfLZ                                    r10          
        7607982 (x 1.656)      140 MB/s  638 MB/s        55e6  252e6
    ZSTD                                    2015-01-31   
        5510625 (x 2.286)      141 MB/s  403 MB/s        79e6  226e6
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    done... (4*X*1) iteration(s)).
    Phenom2, clang 3.3, compiled for AMD64.
    Last edited by m^2; 26th March 2015 at 23:02.

  10. The Following User Says Thank You to m^2 For This Useful Post:

    gpnuma (26th March 2015)

  11. #8
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Well, I don't want to distract from the Density release in this thread, but FYI I did my own implementation of Chameleon, now up on cbloomrants. SIMD Chameleon looks very promising.
    Last edited by cbloom; 3rd August 2016 at 20:33.

  12. The Following 2 Users Say Thank You to cbloom For This Useful Post:

    Bulat Ziganshin (26th March 2015),gpnuma (26th March 2015)

  13. #9
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by m^2 View Post
    Quick and dirty test:
    Code:
    m% ./fsbench fast ~/bench/scc1.tar 
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    density::chameleon                      0.12.0 beta  
        8618714 (x 1.461)      682 MB/s  723 MB/s       215e6  228e6
    density::cheetah                        0.12.0 beta  
        7282526 (x 1.730)      337 MB/s  325 MB/s       142e6  136e6
    density::lion                           0.12.0 beta  
        6674599 (x 1.887)      153 MB/s  155 MB/s        72e6   72e6
    lrrle                                   0            256
       12281040 (x 1.026)     3063 MB/s 3093 MB/s        76e6   76e6
    LZ4                                     r127         
        7200430 (x 1.749)      247 MB/s 1244 MB/s       105e6  532e6
    LZF                                     3.6          very
        7090337 (x 1.776)      174 MB/s  517 MB/s        76e6  226e6
    LZO                                     2.08         1x1
        7152460 (x 1.761)      383 MB/s  599 MB/s       165e6  259e6
    QuickLZ                                 1.5.1b6      1
        6804291 (x 1.851)      335 MB/s  328 MB/s       154e6  150e6
    Snappy                                  1.1.0        
        7070230 (x 1.781)      322 MB/s  742 MB/s       141e6  325e6
    wfLZ                                    r10          
        7607982 (x 1.656)      140 MB/s  638 MB/s        55e6  252e6
    ZSTD                                    2015-01-31   
        5510625 (x 2.286)      141 MB/s  403 MB/s        79e6  226e6
    Codec                                   version      args
    C.Size      (C.Ratio)        E.Speed   D.Speed      E.Eff. D.Eff.
    done... (4*X*1) iteration(s)).
    Phenom2, clang 3.3
    32bit or 64bit system ? Just asking because density is heavily optimized for 64bit

  14. #10
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by cbloom View Post
    Well, I don't want to distract from the Density release in this thread, but FYI I did my own implementation of Chameleon, now up on cbloomrants. SIMD Chameleon looks very promising.
    Hello,

    On the contrary, this is actually extremely interesting. I'm looking at your code.
    Did you compare the end result with the binary I linked to (https://github.com/centaurean/sharc/...harc_win64.exe) ?
    You don't get the exact same file size as density because it has a block structure (to store integrity and efficiency checks) that is not reproduced in your implementation.
    I do a lot of unrolling in density but not in the area you are testing, probably because there is an endian test just before, but there could be a workaround for that.
    So all in all, this is very interesting !! Thanks a lot for your input.

    I sent you a PM with my personal coordinates.

  15. #11
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    @cbloom

    I did a lot of thorough testing today, trying different unrollings in Chameleon as you did.
    For decompression I managed to obtain a 5-10 % speed increase using deeper unrolling in the main kernel, and for compression I could get a 1-3 % increase but that's it !

    The corresponding commit is here : https://github.com/centaurean/densit...db0d12bd6c9df5 .

    I'm curious to see your SIMD trial !
    Last edited by gpnuma; 27th March 2015 at 01:33.

  16. #12
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by gpnuma View Post
    32bit or 64bit system ? Just asking because density is heavily optimized for 64bit
    64

  17. #13
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Arrow LzTurbo Chameleon Density Lz4 Benchmark

    Single core in memory benchmarks with i7-2600K CPU at 4.5GHz.
    enwik9 (Text)
    app1.tar (binary w/o filter) from:
    http://compressionratings.com/
    Code:
                        size     ratio%   C MB/s     D MB/s  (bold=pareto)    
    enwik9            426854364   42.7    209.21    3100.28   LzTurbo 11 v1.2
    enwik9            436972256   43.7    347.12     349.32   density 3  v0.12.0
    enwik9            504420284   50.4    469.67    3134.40   LzTurbo 10 v1.2
    enwik9            507084562   50.7    446.78    2247.14   lz4        v1.6.0
    enwik9            607779360   60.8   1521.09    2861.99   chameleon  v15-03
    enwik9            607783200   60.8   1184.54    1769.25   density 1  v0.12.0
    enwik9            607820790   60.8   1537.88    2433.05   chameleon2 v15-03
    enwik9           1000000000  100.0   7950.00    7950.00   inline memcpy
    enwik9           1000000000  100.0   5995.00    5995.00   libc memcpy
    
    app1.tar          192165207   38.2    352.81    3717.49   LzTurbo 11 v1.2
    app1.tar          207440194   41.2    816.66    3702.46   LzTurbo 10 v1.2
    app1.tar          207741094   41.3    798.20    2925.50   lz4        v1.6.0
    app1.tar          210597820   41.8    358.95     349.05   density 3  v0.12.0
    app1.tar          338756652   67.3   1527.53    2336.40   chameleon  v15-03
    app1.tar          338758598   67.3   1192.15    1511.66   density 1  v0.12.0
    app1.tar          338931364   67.3   1600.70    2398.49   chameleon2 v15-03
    app1.tar          503534592  100.0   7950.00    7950.00   inline memcpy
    app1.tar          503534592  100.0   5995.00    5995.00   libc memcpy
    Chameleon: http://cbloomrants.blogspot.de/2015/...chameleon.html
    Last edited by dnd; 28th March 2015 at 20:54.

  18. #14
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    I just tried lzturbo but results on my win64, Core i7 platform are not really comparable, here's the test :

    Code:
    E:\Applications\lzturbo>lzturbo
    
    lzturbo 1.2 Copyright (c) 2007-2014 Hamid Buzidi   Aug 11 2014
    
    
     Usage: lzturbo <options> <filename1> .. <filenameN> DESTINATION_DIR
    <options>
     -ml     m: compression method (1..4), l:compression level (0,1,2,9).
     -p#     #: number of processors/cores (default=autodetect). 0=disable multithreading
     -b#     #: block size in mb (default 64)
     -r      compress/decomp directories recursively
     -d      decomp
     -f      force overwrite of output file
     -o      write on standard output
    Ex.: lzturbo -32 -f file.jpg ab*.txt backup_dir
         lzturbo -49 -r mydir\* backup_dir
         lzturbo -10 file file.lzt
         cat file | ./lzturbo -of -10 >file.lzt
         lzturbo -d -r backup_dir\* restore_dir
         lzturbo -d file.lzt file
    Code:
    E:\Applications\lzturbo>e:\ProcProfile64.exe lzturbo.exe -p0 -f -10 e:\enwik9 e:\
    
    Process ID       : 115920
    Thread ID        : 116680
    Process Exit Code: 0
    Thread Exit Code : 0
    
    
    User Time        :           3.078s
    Kernel Time      :           0.328s
    Process Time     :           3.406s
    Clock Time       :           3.447s
    
    
    Working Set      :            8452 KB
    Paged Pool       :              17 KB
    Nonpaged Pool    :               3 KB
    Pagefile         :           11376 KB
    Page Fault Count : 2161
    
    
    IO Read          :          976562 KB (in             240 reads )
    IO Write         :          457929 KB (in             240 writes)
    IO Other         :               1 KB (in              91 others)
    User time : 3.078s, file size : 468,919,591 bytes

    Code:
    E:\Applications\lzturbo>e:\Dev\sharc\sharc.exe
    Centaurean Sharc 1.2.2 powered by Centaurean Density 0.12.1
    Copyright (C) 2013 Guillaume Voirin
    Built for Microsoft Windows (Little endian system, 64 bits) using GCC 4.9.2, Mar 28 2015 22:26:39
    
    
    Superfast compression
    
    
    Usage :
      sharc [OPTIONS]... [FILES]...
    
    
    Available options :
      -c[LEVEL], --compress[=LEVEL]     Compress files using LEVEL if specified (default)
                                        LEVEL can have the following values (as values become higher,
                                        compression ratio increases and speed diminishes) :
                                        0 = No compression
                                        1 = Chameleon algorithm (default)
                                        2 = Cheetah algorithm
                                        3 = Lion algorithm
      -d, --decompress                  Decompress files
      -p[PATH], --output-path[=PATH]    Set output path
      -x, --check-integrity             Add integrity check hashsum (use when compressing)
      -f, --no-prompt                   Overwrite without prompting
      -i, --stdin                       Read from stdin
      -o, --stdout                      Write to stdout
      -v, --version                     Display version information
      -h, --help                        Display this help
    Code:
    E:\Applications\lzturbo>e:\ProcProfile64.exe e:\Dev\sharc\sharc.exe -f e:\enwik9
    Compressed e:\enwik9 (1,000,000,000 bytes) to e:\enwik9.sharc (607,783,232 bytes) -> 60.8% (User time 0.703s -> 1422 MB/s)
    
    
    Process ID       : 116180
    Thread ID        : 116136
    Process Exit Code: 1
    Thread Exit Code : 1
    
    
    User Time        :           0.703s
    Kernel Time      :           0.468s
    Process Time     :           1.171s
    Clock Time       :           1.172s
    
    
    Working Set      :            3436 KB
    Paged Pool       :              26 KB
    Nonpaged Pool    :               3 KB
    Pagefile         :            2616 KB
    Page Fault Count : 901
    
    
    IO Read          :          976562 KB (in            1909 reads )
    IO Write         :          593538 KB (in            2334 writes)
    IO Other         :               1 KB (in              89 others)
    
    E:\Applications\lzturbo>e:\ProcProfile64.exe e:\Dev\sharc\sharc.exe -c2 -f e:\enwik9
    Compressed e:\enwik9 (1,000,000,000 bytes) to e:\enwik9.sharc (494,036,812 bytes) -> 49.4% (User time 1.531s -> 653 MB/s)
    
    
    Process ID       : 116672
    Thread ID        : 116064
    Process Exit Code: 1
    Thread Exit Code : 1
    
    
    User Time        :           1.531s
    Kernel Time      :           0.375s
    Process Time     :           1.906s
    Clock Time       :           1.989s
    
    
    Working Set      :            4468 KB
    Paged Pool       :              26 KB
    Nonpaged Pool    :               3 KB
    Pagefile         :            3128 KB
    Page Fault Count : 1159
    
    
    IO Read          :          976562 KB (in            1909 reads )
    IO Write         :          482457 KB (in            1901 writes)
    IO Other         :               1 KB (in              89 others)
    
    E:\Applications\lzturbo>e:\ProcProfile64.exe e:\Dev\sharc\sharc.exe -c3 -f e:\enwik9
    Compressed e:\enwik9 (1,000,000,000 bytes) to e:\enwik9.sharc (436,972,288 bytes) -> 43.7% (User time 2.719s -> 368 MB/s)
    
    
    Process ID       : 116288
    Thread ID        : 112960
    Process Exit Code: 1
    Thread Exit Code : 1
    
    
    User Time        :           2.718s
    Kernel Time      :           0.421s
    Process Time     :           3.139s
    Clock Time       :           3.142s
    
    
    Working Set      :            5460 KB
    Paged Pool       :              26 KB
    Nonpaged Pool    :               3 KB
    Pagefile         :            3636 KB
    Page Fault Count : 1410
    
    
    IO Read          :          976562 KB (in            1909 reads )
    IO Write         :          426730 KB (in            1794 writes)
    IO Other         :               1 KB (in              89 others)
    User times :
    sharc -c1 : 0.703s, file size : 607,783,232 bytes
    sharc -c2 : 1.531s, file size : 494,036,812 bytes
    sharc -c3 : 2.718s, file size : 436,972,288 bytes

    I stopped testing there as lzt file size and timings were very different to your benchmark's.
    Last edited by gpnuma; 29th March 2015 at 01:44.

  19. #15
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    In this in-memory testing I'm using other tuning parameters and block size as the defaults in the uploaded "LzTurbo v1.2".
    Try using "lzturbo -10 -b1000 -D12 enwik9 .".

    This benchmark should be considered only as an indication because as one can see, the compression ratio of LzTurbo is marginally better than "density 1" or chameleon.
    Additionally I'm showing only compression mode "10" and "11".

    It will be nice, if you can verify "density 2" (latest version) with these 2 files, because I'm getting a compare error.

  20. #16
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    gpnuma, as you target encoding speed, maybe try to go ZSTD way - add fast entropy coder as compression option. E.g. FSE has 400MB/s encoding, 530MB/s decoding and should significantly improve compression ratio:
    https://github.com/Cyan4973/FiniteStateEntropy
    SIMD rANS can be even faster, at cost of a bit worse ratio: https://github.com/rygorous/ryg_rans

  21. The Following User Says Thank You to Jarek For This Useful Post:

    gpnuma (31st March 2015)

  22. #17
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Quote Originally Posted by Jarek View Post
    gpnuma, as you target encoding speed, maybe try to go ZSTD way - add fast entropy coder as compression option. E.g. FSE has 400MB/s encoding, 530MB/s decoding and should significantly improve compression ratio:
    https://github.com/Cyan4973/FiniteStateEntropy
    SIMD rANS can be even faster, at cost of a bit worse ratio: https://github.com/rygorous/ryg_rans
    Thanks. I have actually been considering an entropy coder to "finalize" stages that escape density's compressed-domain scope, however so far nothing fits both these assumptions :

    * Streamable (in other words you don't want to read backwards and if I'm not mistaken FSE requires backwards reading, or you could split data in blocks but you get the same streaming problem) with very low granular sizes (for now 512 bytes is the minimum work output buffer).
    * Fast (dynamic huffman coding and arithmetic coding are not fast enough although they are streamable)

    And open-source of course !

  23. #18
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Indeed tANS/FSE (all ANS) require e.g. encoding to be in backward direction, but it can be done in blocks of practically any size - they use 10-30kB blocks.
    Regarding adaptive change of probability, generating new tANS table is much cheaper than for Huffman - cheap linear initialization instead of sorting the symbols.
    rANS is more convenient for dynamically changing probability distribution, see e.g. LZA http://encode.ru/threads/2079-nARANS...iant-of-ANS%29

    Regarding Huffman - available implementations are essentially slower and give worse compression ratio. Here are some benchmarks:
    http://encode.ru/threads/1920-In-mem...entropy-coders

  24. #19
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Indeed tANS/FSE (all ANS) require e.g. encoding to be in backward direction, but it can be done in blocks of practically any size - they use 10-30kB blocks.
    Quick question : does the compression ratio suffer if you use very small blocks (1-2 kB) ?

    This is how I would use an entropy coder if a good one fits :

    Uncompressed data => 50%-95% taken care of by density's algorithms => entropy coder for symbols that escaped the previous processing => Compressed data

    The problem is, I don't want to break the stream ability of density because I can see a lot of use cases with this feature.
    As I would use an entropy coder to compress only a small subset of data (the data that could not successfully be compressed with the already available - and much faster - algorithms), to feed 10-30kB of data to the entropy coder might mean in some extreme cases that the other algorithms would have to process a few hundred kB, maybe more - in the case of a highly compressible file for example -.
    Which in the end means that a few hundred kB would have to be processed before our 10-30kB block is filled and sent to the output... therefore breaking any streaming capability.

  25. #20
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    If you model only flags then with the basic chameleon method all you need to transmit is single probability per block. Overhead of arithmetic coder/ range coder/ FSE coder is usually either maximum 32-bits or 16-bits - that's for the last flushed state of range. For the medium strength version you have 2-bit flags so you need to transmit 3 probabilities (4th can be computed from others) and also you'll have that max 32-bit or 16-bit last step overhead.

    Is density suited for tiny blocks? I think it has relatively high initialization overhead - it needs to initialize a few hundred kilobytes of memory. Thus, I think that smallest reasonable block sizes are maybe 100 kilobytes or so. If you have 1-bit flag per 32-bits of input data then you're left with 3 kilobytes of flags. If you compress to less than about 99% of original data then you win some bytes (that 1% or less is used for probabilities and last step overhead)

  26. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    gpnuma (1st April 2015)

  27. #21
    Member
    Join Date
    Aug 2013
    Location
    France
    Posts
    77
    Thanks
    27
    Thanked 26 Times in 11 Posts
    Hey Piotr ! nice to hear from you

    If you model only flags then with the basic chameleon method all you need to transmit is single probability per block
    Yes, flag modelling usually works well and it's already implemented in the deeper-ratio algorithm lion, but it has a serious cost which is performance.
    After a lot of testing, I don't think lion's ratio would have been noticeably better if I had used a perfect entropy coding for flags, like arithmetic coding or ANS, but it would have been slower for sure. That's the reason I decided to use a self-made entropy coder based on quick symbol shifting/ranking, which is rougher than arithmetic coding or ANS but quicker.
    I've only been considering a good entropy coder to help out on the data escaping density's scope because I have a feeling the loss in speed will be minimal as it's only a small subset of the total data processed, and it could have a good impact on ratio as this data is currently totally uncompressed. So the gain in ratio compared to the loss in speed could probably make this interesting, although it's yet to be tested.

    Is density suited for tiny blocks? I think it has relatively high initialization overhead - it needs to initialize a few hundred kilobytes of memory
    Yes and no. Of course since the main objective is speed you would think that tiny blocks are not very useful, but actually I'm inclined to think the opposite. Since the speed of the algorithms make them suitable for realtime compression, they could be used for real time data transmissions as well, on networks for example. If you're doing any realtime activity and the network stalls, you need to have a way to compress and on the other side decompress small sizes of data so you can maintain responsiveness : you can't wait on the decompression side to receive 100kB to be able to interpret data because in that case you completely lose the real-time capability.

  28. #22
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by gpnuma View Post
    Quick question : does the compression ratio suffer if you use very small blocks (1-2 kB) ?
    The issue here is the cost of storing the final state of ANS: ~ 10 bits for tANS/FSE (less for smaller alphabets). However, this cost can be compensated by storing some information in the initial state - we are talking about ~1 bit loss per block.
    Alternatively, we can use this state as a checksum - fix the initial encoder state, and if there was an error, the final decoding state will be most likely different - indicating an error. This is probably the default setting of FSE. In this case there is ~10 bits loss per block, about 1 promile in your case.
    Last edited by Jarek; 1st April 2015 at 08:52.

Similar Threads

  1. SHARC / DENSITY news and updates
    By gpnuma in forum Data Compression
    Replies: 18
    Last Post: 15th February 2018, 19:59
  2. SHARC/DENSITY updates
    By gpnuma in forum Data Compression
    Replies: 6
    Last Post: 2nd February 2015, 20:22
  3. 7z 4.47 beta. To use or not to use?
    By giorgiotani in forum Forum Archive
    Replies: 6
    Last Post: 31st May 2007, 15:45
  4. PIM 1.25 beta is here!
    By encode in forum Forum Archive
    Replies: 22
    Last Post: 3rd October 2006, 18:27
  5. PIM 1.20 beta is here!
    By encode in forum Forum Archive
    Replies: 20
    Last Post: 25th September 2006, 12:37

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •