Results 1 to 12 of 12

Thread: Sub-linear time BWT construction

  1. #1
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts

    Sub-linear time BWT construction

    https://arxiv.org/abs/1904.04228

    Note: I haven't read the paper and have to admit I'm baffled how it can be sublinear given you need to store everything in there. It even claims sub-linear space too. Isn't that "magic", or am I just misunderstanding what they mean here?

    I'm hoping someone closer to the field of BWTs can cast some light, but colour me sceptical!

    Edit: I suppose you could have something that takes n + 10^12 n/log(n) and claim for all practical sizes of n the sub-linear part is dominant in time, but it's still really O(n) at heart. Or alternatively it's sub-linear on a specific type of data, such as sorted strings. I guess I'll just have to read it!

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Given a binary string of length n occupying O(n / log n) machine words of space
    They assume that machine word contains O(log n) of input symbols.

  3. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Yes, I came to that same conclusion:
    https://twitter.com/BonfieldJames/st...89552848642048

    I think this therefore also explains the sub-linear time. It is sub-linear with n because the machine word size is O(log n), which can remove a log n component from the complexity. There may still be something interesting in there, but it's not what it appears I think. (Hardly surprising given the bold claims.)

  4. #4
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    68
    Thanks
    31
    Thanked 22 Times in 15 Posts
    They assume binary alphabet and log n word size. They are doing operations of log n bits in O(1) time. This is how they get sublinear.

    EDIT: Oh JamesB you answered faster than me

  5. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by algorithm View Post
    EDIT: Oh JamesB you answered faster than me
    I should have noticed that comment in the abstract earlier

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Are there any works that take into account memory latency in algorithm complexity estimations?
    I've seen one paper with sort benchmarks in {L1,L2,L3,RAM,server rack,datacenter,distributed} ladder,
    which appoximated it as O(sqrt(N)), but it was in russian.


  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Are there any works that take into account memory latency in algorithm complexity estimations?
    It doesn't seem so, at least for the algorithm presented in this topic. "String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure" by Dominik Kempa and Tomasz Kociumaka mentions that:
    Each symbol of T ′ takes Θ(τ log σ) = Θ(log n) bits. Since the suffix array of a string over such a large alphabet can be computed in linear time [24], we can sort the suffixes T [s i . . n] in O(|S|) = O( n / τ ) time.
    (...)
    [24] Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. Journal of the ACM, 53(6):918–936, 2006. doi:10.1145/1217856.1217858.
    So they use "Linear Work Suffix Array Construction" by Juha Kärkkäinen, Peter Sanders and Stefan Burkhardt as a subprogram, but in this paper there are such statements:
    Cache oblivious: These algorithms are similar to external algorithms with a single disk but they are not allowed to make explicit use of the block size B or the internal memory size M . This is a serious restriction here since no cache oblivious integer sorting with O( (n/B) log (M/B) (n/B) ) cache faults and o(n log n) work is known. Hence, we can as well go to the comparison based alphabet model. The result is then an immediate corollary of the optimal comparison based sorting algorithm
    As for the original skepticism - it seems to me that the algorithm is legit, provided that you assume log(n) machine word size as normal and ignore data locality (which in practice is often paramount to performance).

    KS algorithm is able to sort O(n / lg n ) suffixes with O(lg n) symbol size in O(n / lg n) time if machine word size is O(lg n). That would simply translate to BWT output over O(n / lg n) suffixes. The algorithm presented in this topic does some magic to transform that sorted suffix array into a BWT output with O(1) symbol size and O(n) symbols in O(n / lg n) time provided you can operate on O(lg n) machine word in O(1) size.

  8. The Following 3 Users Say Thank You to Piotr Tarsa For This Useful Post:

    Bulat Ziganshin (14th April 2019),JamesB (14th April 2019),Shelwien (14th April 2019)

  9. #8
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    KS algorithm is able to sort O(n / lg n ) suffixes with O(lg n) symbol size in O(n / lg n) time if machine word size is O(lg n). That would simply translate to BWT output over O(n / lg n) suffixes. The algorithm presented in this topic does some magic to transform that sorted suffix array into a BWT output with O(1) symbol size and O(n) symbols in O(n / lg n) time provided you can operate on O(lg n) machine word in O(1) size.
    I think the algorithm is, but perhaps the complexity analysis is misleading.

    If we assume we're on a 64-bit machine, then a machine word of size O(logn) is easy to obtain for realistic sizes of n. However it *varies* with n. This means as we scale up n, the word size also scales up meaning fewer machine instructions. While doable (approximately at 8, 16, 32, 64 bits) it's changing the complexity in subtle ways by factoring out a log(n) component. No programmer I know would do this! We'd just use the largest word size for smaller n too so the speed up is flat across all n sizes (barring cache issues).

    Secondly, by using a linear suffix array construction the whole algorithm can be no faster than O(n) complexity, unless they're only counting the sorting time. I still don't understand where their grand sub-linear claim comes from unless they are taking a very specific approach of what BWT construction means (eg transforming a suffix array into a BWT). I'm not versed in the papers of this field, so maybe that's normal.

  10. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Secondly, by using a linear suffix array construction the whole algorithm can be no faster than O(n) complexity, unless they're only counting the sorting time. I still don't understand where their grand sub-linear claim comes from unless they are taking a very specific approach of what BWT construction means (eg transforming a suffix array into a BWT). I'm not versed in the papers of this field, so maybe that's normal.
    In the KS phase they assume that there are O(n / lg n) symbols where each symbol is (lg n) bits in size. This leads to time complexity O(n / lg n) because they assume a machine can handle one such symbol in O(1) time (and that's actually true. After that they convert the output of KS to match an assumption that symbol has size 1 (ie. individual bits are symbols).

    Their speedup claim is:
    Given a binary string of length n, our procedure builds the Burrows-Wheeler transform in O(n / sqrt(log n)) time
    So the speedup factor is sqrt(log n). IIUC log n here is equal to number of symbols that fits in one machine word, so speedup factor is sqrt(number of symbols fitting in one machine word). If symbols are bits and machine word size = 64 bits then speedup is sqrt(64) = 8. That's the optimistic scenario. In case of BWT-ing bytes a machine word can contain 8 symbols (bytes), so theoretical speedup factor is sqrt(8 ) ~=~ 2.8. In practice probably divsufsort (and msufsort if Michael finishes it) would destroy it in performance metrics, because of much lower constant factors, I think.

    They do not consider SIMD vectors a machine word, probably because they need to be able to do memory addressing with whole machine word.

    PS: I'm not 100% sure I did the analysis correctly. Input from authors or someone else that understands the paper fully would help.

  11. #10
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    In the KS phase they assume that there are O(n / lg n) symbols where each symbol is (lg n) bits in size. This leads to time complexity O(n / lg n) because they assume a machine can handle one such symbol in O(1) time (and that's actually true. After that they convert the output of KS to match an assumption that symbol has size 1 (ie. individual bits are symbols).
    That's the bit I simply don't like. It feels like cheating.

    Let's think about data copy. If you have a 64-bit word size then yes you can copy n bitsin O(n/64). That's fine. We can copy 64 billion bits in 1 billion ops. Easy.

    If you have n of 65536, I'd still say you can use a 64-bit word and have 65536/64 operations. They however would say that needs a 16-bit word size and so it's 65536/16 ops. It's just illogical! Their approach claims that copying n bits takes n/log(n) ops purely because they're being less efficient with small n, which will just leave most programmers scratching their heads.

    There's an argument to be made that long term, over decades, as memory size has increased so has word size, but it's a pretty weak argument when describing complexity.

  12. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Let's look at it in another way. If (like authors of the paper) you do suffix sorting over bitstrings (i.e. there are as many suffixes as individual bits in the input) then the suffix array will have size of O(n lg n). (lg n) bits are needed to describe suffix index in an input of length n and there are n such indexes in a full suffix array. Isn't it cheating to say we have a linear space (and/ or time) suffix array construction algorithms at all?

  13. #12
    Member
    Join Date
    Dec 2014
    Location
    Berlin
    Posts
    29
    Thanks
    35
    Thanked 26 Times in 12 Posts
    An implementation and a comparison to other algorithms' implementations would be needed now.

Similar Threads

  1. Replies: 3
    Last Post: 16th May 2017, 22:08
  2. Lightweight BWT construction
    By Matt Mahoney in forum Data Compression
    Replies: 2
    Last Post: 15th July 2013, 19:03
  3. An Elegant Algorithm for the Construction of Suffix Arrays
    By willvarfar in forum Data Compression
    Replies: 18
    Last Post: 11th July 2013, 16:01
  4. Linear and Geometric Mixtures - Analysis
    By toffer in forum Data Compression
    Replies: 0
    Last Post: 27th January 2013, 22:12
  5. Generator matrix for linear code
    By azizever83 in forum Data Compression
    Replies: 0
    Last Post: 9th June 2012, 08:37

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •