Page 1 of 2 12 LastLast
Results 1 to 30 of 56

Thread: qad, new experimental BWT(qlfc) based compressor

  1. #1
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts

    qad, new experimental BWT(qlfc) based compressor

    Based on the ideas from
    http://encode.ru/threads/1731-Improv...highlight=qlfc

    I am releasing a small demo, program does not involve BWT transform yet (hopefully in some future version). Also there is no RLE or context mixing yet and compressor does not use any filters.

    It may fail on some files due to lack of thorough testing.

    Qlfc is computed via MTF and the result is transformed into (now static) single level tree structure and compressed with bitwise EC. EC(AC) is probably the slowest part, I should replace it with some faster RC in future.

    BTW for some strange reason Windows cross-compiled version seems almost twice as fast on Linux under Wine than the native Linux binary. (g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 vs i686-w64-mingw32-g++ (GCC) 4.7.3 20121208 with same code optimizations)
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    144
    Thanks
    29
    Thanked 58 Times in 34 Posts
    unBWT format is unknown.
    Where is first index? How many bytes?
    Is it sentinel version or non-sentinel version?
    Last edited by xezz; 29th May 2014 at 14:42.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by xezz View Post
    unBWT format is unknown.
    Where is first index? How many bytes?
    Is it sentinel version or non-sentinel version?
    That's why you should used BWTS see Yaya's version for fastest. It's a purely bijective transform that is a permutation that is reversible and unique no index or sentinel needed

  4. #4
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    144
    Thanks
    29
    Thanked 58 Times in 34 Posts
    Decodig: Test files are all fails.
    That's why you should used BWTS see Yaya's version for fastest
    Please tell me reference.

  5. #5
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Added a simple RLE, there is a slight boost in compression, though more noticeable I think is the speed-up. Even though the memory requirements have increased, so it may be slower on some small files in the end. I am thinking of some advanced RLE, but I will have to sort the memory out.

    Quote Originally Posted by xezz View Post
    unBWT format is unknown.
    Where is first index? How many bytes?
    Is it sentinel version or non-sentinel version?
    There is no BWT transform involved as of yet, but hopefully in the future. You can use BWTS or what suits you. Notice it's still experimental and I didn't have much time to test it more deeply, so if decompression fails, please let me know.
    Attached Files Attached Files
    Last edited by quadro; 30th May 2014 at 03:37.

  6. #6
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    144
    Thanks
    29
    Thanked 58 Times in 34 Posts
    input:a
    outputsize:10


    input:nnbaaa
    outputsize:13


    Decompression is crashes

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    ok you can get BWTS here

    http://encode.ru/threads/104-libBWT?p=8627#post8627


    wiki changes frequently about bijective bwt but unlike bwt there is
    only one transformation and yuta code is among the best for it

    here is one paper

    http://arxiv.org/abs/1201.3077

    here is another paper

    http://arxiv.org/pdf/0908.0239.pdf

    the first is first cut of a paper rejected by those who may not have understood the basic idea
    the second is paper actully submitted at a string conference on prague and is more polished

  8. The Following User Says Thank You to biject.bwts For This Useful Post:

    xezz (31st May 2014)

  9. #8
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    @xezz this is probably due to header parsing issue, or a footer in this case, when input alphabet is too small. I will try to fix it.

    @biject.bwts thanks to note, at this point my footer is not compressed and it contains some redundancy, anyway it could be useful later on to squeez some more bytes.
    Last edited by quadro; 30th May 2014 at 18:30.

  10. #9
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Well this should possibly fix the footer parsing issue. Now as the footer structure has changed and grown a bit larger, especially if the file is small, the backward compatibility will not be retained.

    Also it seems a better RLE is possible, but it will require either more memory or time. If I manage to resolve memory allocation issue, it could run at about the same speed, anyway I'd like to add a new compression mode.
    Attached Files Attached Files

  11. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,465
    Thanks
    26
    Thanked 116 Times in 91 Posts
    For the curious, here are my results for the qad0.3:
    Code:
    piotrek@piotrek-desktop:~/Pulpit/enwik.test$ ll
    razem 3406436
    drwxrwxr-x  2 piotrek piotrek       4096 maj 31 19:13 ./
    drwxr-xr-x 10 piotrek piotrek       4096 maj 31 19:03 ../
    -rw-rw-r--  1 piotrek piotrek  100000008 maj 31 18:50 enwik8.bwt
    -rw-rw-r--  1 piotrek piotrek   21217569 maj 31 18:51 enwik8.bwt.qad
    -rw-rw-r--  1 piotrek piotrek  100000008 maj 31 18:51 enwik8.bwt.qad.dec
    -rw-rw-r--  1 piotrek piotrek  100000000 maj 31 18:52 enwik8.bwt.qad.dec.unbwt
    -rw-rw-r--  1 piotrek piotrek 1000000008 maj 31 19:04 enwik9.bwt
    -rw-rw-r--  1 piotrek piotrek  166913700 maj 31 19:06 enwik9.bwt.qad
    -rw-rw-r--  1 piotrek piotrek 1000000008 maj 31 19:12 enwik9.bwt.qad.dec
    -rw-rw-r--  1 piotrek piotrek 1000000000 maj 31 19:15 enwik9.bwt.qad.dec.unbwt
    piotrek@piotrek-desktop:~/Pulpit/enwik.test$ md5sum *
    0d21d5f1e5b08fd78893b461a22abcbe  enwik8.bwt
    55797914ab1b91209dad2170df1962dd  enwik8.bwt.qad
    0d21d5f1e5b08fd78893b461a22abcbe  enwik8.bwt.qad.dec
    a1fa5ffddb56f4953e226637dabbb36a  enwik8.bwt.qad.dec.unbwt
    173a69860b1bd538698a9740c7496fec  enwik9.bwt
    29113eb6d2c85b94b73f9dbfd1a4d5ae  enwik9.bwt.qad
    173a69860b1bd538698a9740c7496fec  enwik9.bwt.qad.dec
    e206c3450ac99950df65bf70ef61a12d  enwik9.bwt.qad.dec.unbwt
    piotrek@piotrek-desktop:~/Pulpit/enwik.test$
    qad and dec files are from qad. bwt and unbwt files are from libdifsufsort examples.
    I've modified libdivsufsort to accept blocks above 1e9 bytes so enwik9 is sorted in single block.

  12. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    quadro (6th June 2014)

  13. #11
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Thanks for testing Pioter, I am quite surprised with enwik9 result myself, haven't tried it on such a big block.

    Here is a new version with slightly improved RLE(as default compression mode), it is yet not optimized for speed nor compression ratio and it's noticeably slower also it may fail on some files. But from now, you can try to use either c/c2 switch for default or faster mode. BTW there is no context mixing yet, but I hope I will get to it one day too.
    Attached Files Attached Files
    Last edited by quadro; 6th June 2014 at 00:50.

  14. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,483
    Thanks
    719
    Thanked 653 Times in 349 Posts
    are you know about FSE and other fast ANS entropy coders? may be they can successfully replace RC in your program?

  15. #13
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Thanks for tips, No only heard of ANS, yet about FSE I hear the first time. So not sure how it performs, but I may try to use it, if its good comparative.

  16. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,483
    Thanks
    719
    Thanked 653 Times in 349 Posts
    it's a huge area. yoг can find FSE topic here, also Charles Bloom and Cyan wrote a lot about ANS codecs in their blogs. there is also binary ANS codecs, not sure whether they are better than AC ones

  17. #15
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    I read that FSE is block based, possibly not the best fit for adaptive compression, it may be possible to make adaptive ANS. But for now, here is a verson, which utilises a modification of Subbotin RC (there is no source included yet, because I need to clean up the code a bit).

    Here is some comparision:
                                                enwik8bwt
    Entropy coder def.mode c.time fast mode c.time
    30-bit binary adaptive AC 21182969 1m15.219s 21215914 0m59.563s
    bit-aligned output

    64-bit Subbotin RC mod 21183433 1m5.895s 21216348 0m49.900s
    byte-aligned output

    32-bit Subbotin RC mod 21184310 0m55.034s 21217145 0m40.326s
    bit-aligned output

    32-bit Subbotin RC mod 21186080 0m50.673s 21218792 0m36.456s
    nibble-aligned output

    64-bit RC slowness is caused by that it's not actually 64bit executable, only an emulation, 64 bit binary should be faster.

    Also it seems default mode could ran at about same speed as fast mode, with only slight penalty in compression ratio, therefore I might get rid of that fast mode in future. I have one RLE improvement in my mind right now, but need to write a decoder.
    Attached Files Attached Files

  18. #16
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Alright the new version is up and the licence now changes to GNU GPL,so the source is included.

    It slightly improves again(well not that much) RLE, and you can choose at compile time to build a 32bit or 64bit range coder by Build32bit / Build64bit macros. Also the same way, you can turn on/of latest two RLE enhancements, for older one I have to create.

    There is still some room for improvement, because RLE improves e.g. with recursion, but I think it may be more interesting to try it with some CM.

    To sum up things, I am using a static cyclic tree to encode MTF/qlfc output and than I compress the tree branches, and states(for example, when symbol freq. increases, symbols are less likely to repeat and so on, well not very much like in a CM manner though) using RLE, the final stage is an entropy coding.

    Source is not much clear yet or descriptive I suppose, it's like a one block of code yet, sorry about that.

    Also yet I don't have 64bit Linux, so the binary is just an emulation, don't know if will compile at all.
    Attached Files Attached Files

  19. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,058
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by quadro View Post
    BTW for some strange reason Windows cross-compiled version seems almost twice as fast on Linux under Wine than the native Linux binary. (g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 vs i686-w64-mingw32-g++ (GCC) 4.7.3 20121208 with same code optimizations)
    Code:
    /*
    
    Linux compile example:
    g++ -c qad.cpp && g++ -g -O3 qad.o && time ./a.out c bwt bwt.qad
    
    /opt/mingw32/bin/i686-w64-mingw32-g++ -O3 -s -o qad.exe qad.cpp  && time wine qad.exe c enwik9bwt qad
    
    */
    If that's the line you used to compile the Linux binary, I can tell you why it is slow. When you invoke g++ the first time, it compiles the c++ and that's when you need to pass -O3. The second invocation just links and -O3 does nothing.

    Here's a simple Makefile that works in Linux. It's not doing much, but it allows you to build with a simple and standard `make`, which requires no reading and almost no thought.

    Code:
    ~/build/qad0.6$ cat Makefile 
    CXX = g++
    CXXFLAGS = -O2 -DNDEBUG
    
    all: qad
    
    qad: qad.cpp
    	$(CXX) $(CXXFLAGS) -o $@ $<
    
    clean:
    	$(RM) qad
    You could incorporate the testing command line into the Makefile, which is common and is usually put under a target "test," so you can `make test`. I might edit this message later and add it for you.

    By the way, I could not get it to decompress successfully. It gave me back a zero-length file. It's possible that I'm doing something wrong, but it's hard to see what that might be. It's late. I might try again tomorrow.

    PS. I noticed that I used -O2 in the Makefile. The last time I compared them, -O2 was consistently faster than -O3, so that's what I always use, but if you have found that -O3 is actually faster here, please change that line and write a post about it. I'd be curious.
    Last edited by nburns; 14th June 2014 at 14:38.

  20. #18
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Quote Originally Posted by nburns View Post
    If that's the line you used to compile the Linux binary, I can tell you why it is slow. When you invoke g++ the first time, it compiles the c++ and that's when you need to pass -O3. The second invocation just links and -O3 does nothing.
    Yes exactly, thanks I will include it. I come to it just as I was trying to attach divsufsort.

    And as I just tested, O2 really seems slightely faster. It's not much speed oriented yet, it's a single threaded version, but before this 64bit emulation was crashing linux binary, and now it works and it's even faster than windows one, but that is maybe because of wine emulation.

    Latest version attaches divsufsort 2.0.1, so it maybe easier to use, even though memory req. arose, it has some pros and cons. I was trying to keep memory usage low, but at least now I have more memory to play with. Meanwhile I was trying to do some tree branches weighting, than I realised it's more like a fitting for a particular situation(like small files versus large one), so I will posibly leave this for any possible later CM stage.

    BTW yet I have not tried any preprocessors yet, all of them I just tried like drt e.g. crashes on linux, maybe there was some wrong character encoding.
    Attached Files Attached Files
    Last edited by quadro; 15th June 2014 at 08:37.

  21. #19
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    A new version is up, I have removed one constant for rank to RLE context selection on default mode ( before this worked only for ranks of value of static tree size and above ). Now all ranks are used for selection, except for the first 3, which are itself selected the other way around( from RLE ), skipping first 3 ranks may hurt compression on small files a bit, but the change should help overall.

    Right now I am thinking how to make use of some better prediction for final context encoding. As now there is only probability halving from time to time, and one hash to trigger cum. prob. re-scale for most used context.
    Attached Files Attached Files
    Last edited by quadro; 18th June 2014 at 18:02.

  22. #20
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    It seems, the change from previous version turned out to be highly input data sensitive. And due to very rough freq. distribution approximation I make, I stripped it of default compression mode and added it(slightly modified) as an experimental filter under cf archive switch/option, but the archive compressed on default mode with version 0.8 should be still compatible with this version too. Fast mode archives from 0.7, onward.
    Attached Files Attached Files
    Last edited by quadro; 20th June 2014 at 12:08.

  23. #21
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    860
    Thanks
    441
    Thanked 169 Times in 80 Posts
    I would like to test qad.. why is this version still lacking BWT transform?

  24. #22
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Hello, BWT transform is already there from version 0.7 onward. Sorry if info was wrong. I will probably add there some switch in the future to skip this transform, so i.e. people could try a different type of sorting.
    Also I don't know how that happened, but I left there some debugging info on decoder side at 0.9, but this should now be fixed.

  25. #23
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    860
    Thanks
    441
    Thanked 169 Times in 80 Posts
    thanks for the info..
    When I run "qad c app.tar app.qad" both the 32-bit and 64-bit windows compiles generate an archive of 0 bytes in size.
    This also happens with other input files.
    OS is Win 8.1 x64

  26. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    Quote Originally Posted by Stephan Busch View Post
    thanks for the info..
    When I run "qad c app.tar app.qad" both the 32-bit and 64-bit windows compiles generate an archive of 0 bytes in size.
    This also happens with other input files.
    OS is Win 8.1 x64
    This also happens in 32 bit Vista and 64 bit Ubuntu in Wine 1.6. The 64 bit Linux executable complains of missing libdivsufsort.so.3.

  27. #25
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Interesting, If it's so I may try to include compiled library it into next project as well. I don't know how for windows though, I didn't had much success compiling libdivsufsort dll on Linux. But thanks for testing, I will try to get some 64bit OS too.

    BTW I just found an annoying memory management bug on the decoder side, which I probably made when I started to play with filter in version 0.8. So for next version I plan to slow down decoding a bit, in default mode, it will probably use more memory for safety reasons. I will try to optimize it later.

  28. #26
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    So the version 1.0 is out, it has now slightly slower decompression with higher memory consumption, I was trying to bit simplify the previous one . I have used a completely new algorithm for previous filter though, which I think is now more stable and it works better in general.

    First I didn't want to move it directly under default mode but since it changes workflow quite a bit. I did it so, and again backward compatibility is not maintained.

    Now under the default mode c, is new compression mode, and cf is a bit improved version.

    Default one removes low limit for context selection for first three ranks, according RLE, so that there is no context selection for ranks from RLE at all, which speeds up encoder, decoder a little (on both default c or cf mode), although compression gains seems better.

    Default mode has actually some constant value left still, i.e. which is the top limit for context selection, but as I got to know, I can use previous filter to set this. The problem might be that old algorithm yet don't give always a good or proper estimations, so it is cut-off to not cross size of the tree, to ease up decoding, but that can affect compression ratio negatively for some files.

    The difference is quite small

    Enwik8 old filter (=cf option from 0.9)
    21165185

    New filter, constant top limit for CTX selection according size of the tree and the rank.
    21147089

    New filter, with top limit according freq. estimation (not really good) of the previous algo.
    21140073 estimated top limit (which is under cf flag)
    Attached Files Attached Files
    Last edited by quadro; 21st June 2014 at 18:42.

  29. #27
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,058
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by quadro View Post
    Interesting, If it's so I may try to include compiled library it into next project as well. I don't know how for windows though, I didn't had much success compiling libdivsufsort dll on Linux. But thanks for testing, I will try to get some 64bit OS too.

    BTW I just found an annoying memory management bug on the decoder side, which I probably made when I started to play with filter in version 0.8. So for next version I plan to slow down decoding a bit, in default mode, it will probably use more memory for safety reasons. I will try to optimize it later.
    I was able to compile and statically link libdivsufsort for Windows when I posted some code here before. I must still have the code somewhere.

  30. #28
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Well thanks, I tested it in 32bit windows, vmplayer and it seems I already linked it there. So the sudden program exit, might be connected to not heving enough free memory, I am not sure though. I will add there some notification also will try to reduce memory usage because it already should have enough, as to compute BWT.
    Last edited by quadro; 22nd June 2014 at 17:53.

  31. #29
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    I was thinking how to improve the range coder predictions i.e for RLE ( or tree branches and leaves contexts).

    At present RC maintain mostly only indirect statistics, i.e while RLE feeds range coder with bit-codes of length 3, RC maintains only isolated statistics for those length 3 bit-codes. But what actually is missing is higher order, or a simple order 0.

    Although I am not sure if it's a good idea to combine these, because even order 0 could be quite a big list, i.e. if RLE inputs some big integers, and the trade-off might not be worth it.

  32. #30
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,254
    Thanks
    305
    Thanked 774 Times in 484 Posts
    qad 1.0 still gives me a 0 size compressed file in both 32 bit Windows and 64 bit Linux under Wine 1.6. The 32 and 64 bit Linux versions still complain it can't find libdivsufsort.so.3. I tried:

    qad cf enwik8 enwik8.qad
    qad c enwik8 enwik8.qad
    qad c2 enwik8 enwik8.qad

Page 1 of 2 12 LastLast

Similar Threads

  1. Experimental new ZCM file compressor!
    By Nania Francesco in forum Data Compression
    Replies: 264
    Last Post: 10th June 2018, 19:09
  2. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  3. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 10:14
  4. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 16:47
  5. dcs-bwt - Experimental Burrows-Wheeler Compressor
    By Arkanosis in forum Data Compression
    Replies: 2
    Last Post: 25th June 2008, 03:53

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •