# Thread: Sub-linear time BWT construction

1. ## Sub-linear time BWT construction

https://arxiv.org/abs/1904.04228

Note: I haven't read the paper and have to admit I'm baffled how it can be sublinear given you need to store everything in there. It even claims sub-linear space too. Isn't that "magic", or am I just misunderstanding what they mean here?

I'm hoping someone closer to the field of BWTs can cast some light, but colour me sceptical!

Edit: I suppose you could have something that takes n + 10^12 n/log(n) and claim for all practical sizes of n the sub-linear part is dominant in time, but it's still really O(n) at heart. Or alternatively it's sub-linear on a specific type of data, such as sorted strings. I guess I'll just have to read it!

2. Given a binary string of length n occupying O(n / log n) machine words of space
They assume that machine word contains O(log n) of input symbols.

3. Yes, I came to that same conclusion:

I think this therefore also explains the sub-linear time. It is sub-linear with n because the machine word size is O(log n), which can remove a log n component from the complexity. There may still be something interesting in there, but it's not what it appears I think. (Hardly surprising given the bold claims.)

4. They assume binary alphabet and log n word size. They are doing operations of log n bits in O(1) time. This is how they get sublinear.

EDIT: Oh JamesB you answered faster than me

5. Originally Posted by algorithm
EDIT: Oh JamesB you answered faster than me
I should have noticed that comment in the abstract earlier

6. Are there any works that take into account memory latency in algorithm complexity estimations?
I've seen one paper with sort benchmarks in {L1,L2,L3,RAM,server rack,datacenter,distributed} ladder,
which appoximated it as O(sqrt(N)), but it was in russian.

7. Are there any works that take into account memory latency in algorithm complexity estimations?
It doesn't seem so, at least for the algorithm presented in this topic. "String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure" by Dominik Kempa and Tomasz Kociumaka mentions that:
Each symbol of T ′ takes Θ(τ log σ) = Θ(log n) bits. Since the suffix array of a string over such a large alphabet can be computed in linear time [24], we can sort the suffixes T [s i . . n] in O(|S|) = O( n / τ ) time.
(...)
[24] Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. Journal of the ACM, 53(6):918–936, 2006. doi:10.1145/1217856.1217858.
So they use "Linear Work Suffix Array Construction" by Juha Kärkkäinen, Peter Sanders and Stefan Burkhardt as a subprogram, but in this paper there are such statements:
Cache oblivious: These algorithms are similar to external algorithms with a single disk but they are not allowed to make explicit use of the block size B or the internal memory size M . This is a serious restriction here since no cache oblivious integer sorting with O( (n/B) log (M/B) (n/B) ) cache faults and o(n log n) work is known. Hence, we can as well go to the comparison based alphabet model. The result is then an immediate corollary of the optimal comparison based sorting algorithm
As for the original skepticism - it seems to me that the algorithm is legit, provided that you assume log(n) machine word size as normal and ignore data locality (which in practice is often paramount to performance).

KS algorithm is able to sort O(n / lg n ) suffixes with O(lg n) symbol size in O(n / lg n) time if machine word size is O(lg n). That would simply translate to BWT output over O(n / lg n) suffixes. The algorithm presented in this topic does some magic to transform that sorted suffix array into a BWT output with O(1) symbol size and O(n) symbols in O(n / lg n) time provided you can operate on O(lg n) machine word in O(1) size.

8. ## The Following 3 Users Say Thank You to Piotr Tarsa For This Useful Post:

Bulat Ziganshin (14th April 2019),JamesB (14th April 2019),Shelwien (14th April 2019)

9. Originally Posted by Piotr Tarsa
KS algorithm is able to sort O(n / lg n ) suffixes with O(lg n) symbol size in O(n / lg n) time if machine word size is O(lg n). That would simply translate to BWT output over O(n / lg n) suffixes. The algorithm presented in this topic does some magic to transform that sorted suffix array into a BWT output with O(1) symbol size and O(n) symbols in O(n / lg n) time provided you can operate on O(lg n) machine word in O(1) size.
I think the algorithm is, but perhaps the complexity analysis is misleading.

If we assume we're on a 64-bit machine, then a machine word of size O(logn) is easy to obtain for realistic sizes of n. However it *varies* with n. This means as we scale up n, the word size also scales up meaning fewer machine instructions. While doable (approximately at 8, 16, 32, 64 bits) it's changing the complexity in subtle ways by factoring out a log(n) component. No programmer I know would do this! We'd just use the largest word size for smaller n too so the speed up is flat across all n sizes (barring cache issues).

Secondly, by using a linear suffix array construction the whole algorithm can be no faster than O(n) complexity, unless they're only counting the sorting time. I still don't understand where their grand sub-linear claim comes from unless they are taking a very specific approach of what BWT construction means (eg transforming a suffix array into a BWT). I'm not versed in the papers of this field, so maybe that's normal.

10. Secondly, by using a linear suffix array construction the whole algorithm can be no faster than O(n) complexity, unless they're only counting the sorting time. I still don't understand where their grand sub-linear claim comes from unless they are taking a very specific approach of what BWT construction means (eg transforming a suffix array into a BWT). I'm not versed in the papers of this field, so maybe that's normal.
In the KS phase they assume that there are O(n / lg n) symbols where each symbol is (lg n) bits in size. This leads to time complexity O(n / lg n) because they assume a machine can handle one such symbol in O(1) time (and that's actually true. After that they convert the output of KS to match an assumption that symbol has size 1 (ie. individual bits are symbols).

Their speedup claim is:
Given a binary string of length n, our procedure builds the Burrows-Wheeler transform in O(n / sqrt(log n)) time
So the speedup factor is sqrt(log n). IIUC log n here is equal to number of symbols that fits in one machine word, so speedup factor is sqrt(number of symbols fitting in one machine word). If symbols are bits and machine word size = 64 bits then speedup is sqrt(64) = 8. That's the optimistic scenario. In case of BWT-ing bytes a machine word can contain 8 symbols (bytes), so theoretical speedup factor is sqrt(8 ) ~=~ 2.8. In practice probably divsufsort (and msufsort if Michael finishes it) would destroy it in performance metrics, because of much lower constant factors, I think.

They do not consider SIMD vectors a machine word, probably because they need to be able to do memory addressing with whole machine word.

PS: I'm not 100% sure I did the analysis correctly. Input from authors or someone else that understands the paper fully would help.

11. Originally Posted by Piotr Tarsa
In the KS phase they assume that there are O(n / lg n) symbols where each symbol is (lg n) bits in size. This leads to time complexity O(n / lg n) because they assume a machine can handle one such symbol in O(1) time (and that's actually true. After that they convert the output of KS to match an assumption that symbol has size 1 (ie. individual bits are symbols).
That's the bit I simply don't like. It feels like cheating.

Let's think about data copy. If you have a 64-bit word size then yes you can copy n bitsin O(n/64). That's fine. We can copy 64 billion bits in 1 billion ops. Easy.

If you have n of 65536, I'd still say you can use a 64-bit word and have 65536/64 operations. They however would say that needs a 16-bit word size and so it's 65536/16 ops. It's just illogical! Their approach claims that copying n bits takes n/log(n) ops purely because they're being less efficient with small n, which will just leave most programmers scratching their heads.

There's an argument to be made that long term, over decades, as memory size has increased so has word size, but it's a pretty weak argument when describing complexity.

12. Let's look at it in another way. If (like authors of the paper) you do suffix sorting over bitstrings (i.e. there are as many suffixes as individual bits in the input) then the suffix array will have size of O(n lg n). (lg n) bits are needed to describe suffix index in an input of length n and there are n such indexes in a full suffix array. Isn't it cheating to say we have a linear space (and/ or time) suffix array construction algorithms at all?

13. An implementation and a comparison to other algorithms' implementations would be needed now.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•