Results 1 to 11 of 11

Thread: Basic BWT implementation pointer

  1. #1
    Member
    Join Date
    May 2009
    Location
    CA USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Basic BWT implementation pointer

    I'm looking for a basic implementation of the BWT and inverse. There are many implementations online (here's just one list) but I'm looking for a research friendly library. Most of the implementations are full compression implementations with encoders as well, but I'm looking for pure BWT mostly for experimenting with full text index searching and such.

    Ideally I'd just love some C functions with an API sort of like:

    size_t BWTransform(const unsigned char *inputData, size_t bytecount, unsigned char *output);

    where the return value is a index to the first byte of output (as opposed to inserting a sentenel).
    Similarly, I'd like a transform/untransform pair for a bitwise BWT.

    I'm more interested in clear code than superfast performance, but to start with I'll be using it as a black box anyway.

    I'd appreciate a recommendation of where to get such libraries, or which compressor would be most appropriate for me to dig into and try to isolate the BWT routines myself.
    I've already started doing that using bzip2 but that's only a bytewise version, and I may be reinventing the wheel anyway.

    Thanks!

  2. #2
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Talking

    If you look at this site you can find the c lib of Yuta that has the BWT if you want the index. He also has the BWTS which is a BWT that does not use an index and is bijective. You can change a byte of ASCII to 8 bits of one and zero just pass that to the subroutines. Check out his OPENBWT library for these and several followers to use after the BWT.

  3. #3
    Member
    Join Date
    May 2009
    Location
    CA USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for the reference! I'll look at Yuta first.

    The idea of computing a bitwise BWT by simply calling the bytewise routine with just bytes 0 and 1 never occurred to me... that's a great idea for simplicity. Thanks!

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Simplest version, based on stdlib qsort:
    http://ctxmodel.net/files/BWT.cpp

    An upgrade of above, with custom qsort and inverse transform:
    http://ctxmodel.net/files/mix_test/mix_test_v9.rar /BWT

    And as to bitwise BWT, here's a dummy implementation: http://nishi.dreamhosters.com/u/binbwt_v0.rar
    But unless you have some pure unaligned bitwise data (eg. 2-color bitmaps), it doesn't make any sense for compression.

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    simplest implementation i have seen was in haskell and just one line or so

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    And it would be able to transform enwik8?

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    yes, using about 10^17 bytes of memory

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    http://mattmahoney.net/dc/dce.html#Section_55 has an introduction to BWT but you probably know most of this already. LibDivSufSort by Yuta Mori has a simple interface like what you are looking for.

  9. #9
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Bulat, could you post it here? Sounds interesting.

  10. #10
    Member
    Join Date
    May 2009
    Location
    CA USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts
    LibDivSufSort worked great, nearly exactly what I was looking for. But the other pointers (especially http://ctxmodel.net/, which has all kinds of other interesting support files too) are also great.

    Thanks!

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    import Control.Arrow
    import Data.List
    import Data.Function

    main = interact (show.bwt.head.lines)

    bwt = index >>> sortBy (compare 'on' snd) >>> map fst
    index s = map (\n -> (n, drop n (s++s))) [0..length s-1]



    ps: this is optimized version that needs only ~50*filesize memory to proceed
    Last edited by Bulat Ziganshin; 30th October 2010 at 02:17.

Similar Threads

  1. Move-to-Front Implementation
    By Cyan in forum Data Compression
    Replies: 34
    Last Post: 8th August 2010, 01:11
  2. Implementation of JPEG2000 LLC based on LZMA
    By Raymond_NGhM in forum Data Compression
    Replies: 0
    Last Post: 19th March 2010, 01:14
  3. Help me!!! Visual Basic programmers needed
    By moisesmcardona in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 27th June 2009, 20:46
  4. Statistical implementation of Ziv-Lempel
    By thomas in forum Data Compression
    Replies: 3
    Last Post: 10th February 2009, 20:13
  5. BWT - how to use?
    By m^2 in forum Data Compression
    Replies: 29
    Last Post: 6th November 2008, 03:01

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •