# Thread: Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

1. ## Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

I bumped into this 2016 paper on HN: https://arxiv.org/abs/1607.04346

We show that the compressed suffix array and the compressed suffix tree of a string T can be built in O(n) deterministic time using O(nlogσ) bits of space, where n is the string length and σ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need ω(nlogσ) bits of working space. Our result has immediate applications to other problems, such as yielding the first linear-time LZ77 and LZ78 parsing algorithms that use O(nlogσ) bits.
(emphasis mine)

2. I stumpled upon that paper last year when doing research for my SACA (I implemented it as second stage improvement for libdivsufsort). Gave up on it relativly fast because it uses 3*N memory (if I understand it corectly) and rank queries (which are usally quite slow due to cache misses) while normal SACAs already use 4 * N and there's already the paper 'A Linear-Time Burrows-Wheeler Transformusing Induced Sorting' which does the Burrows-Wheeler Transform in O(n) time using O(n * log σ * log log_σ(n)) (around 2,2 * n in practice) which is almost as fast as SAIS. I don't think that this can be reduced much but maybe I simply misunderstood it.

Anyway last year was a very interesting year for SACAs: "Linear-time Suffix SortingA new approach for suffix array construction" Uwe Baier gave the first non recursive linear time SACA. I turned it into a O(n * log(n)) SACA that passes only twice over the suffix array (and therefore is quite cache friendly). It turned out to be around 50% faster with enwik9 in libdivsufsort than the original second stage. You can find it here: https://github.com/akamiru/sort. An implementation of his paper is available here: https://github.com/waYne1337/gsaca.

Moreover I could show that "Linear-time Suffix Sorting A new approach for suffix array construction" also gave the first independant algorithm for generating the lyndon array without using global data structures. Prof. William F. Smyth gave a speak about this (and other things) at the London Stringology Days that I could share on request.

"An Efficient Algorithm for Suffix Sorting" Z Peng, Y Wang sounded quite interessting, too. I guess it incorporates some of the ideas I used for daware but sadly its not available on the internet.

3. ## The Following 5 Users Say Thank You to Christoph Diegelmann For This Useful Post:

Bulat Ziganshin (15th May 2017),hexagone (15th May 2017),jibz (15th May 2017),Shelwien (16th May 2017),willvarfar (16th May 2017)

4. It turns out dsufsort ("An Efficient Algorithm for Suffix Sorting") works exactly like I thought from the abstract:

Normal prefix doubling sorts each group of depth d by 2d by additonally sorting by ISA[SA[i] + d]. Therefore after each round the already sorted depth doubles (and therfore the parameter d) resulting in a maximum of log(n) rounds. Dsufsort uses an additional array D to store the actual sorting depth for each group and updating the depth to D[i] + D[ISA[SA[i] + D[i]]]. This means that when the first groups are already sorted, some of the next SA may be sorted by d + 2d = 3d. This means the average group depth may grow even faster than twice each round.

Daware uses the trick of GSACA (Uwe Baier's new algo) to improve upon that. It works from the lexicographically highest group to the lowest and sorts them completly until their next smaller suffix (hence the sorting depth equals the value in the lyndon array). We only read every ISA value only once (GSACA is linear) after sorting (this is the only O(n*log n) part here) it. Therefore we can override it after the next sort with the new depth (so we save the space for D). In a second pass we sort from the lowest to the highest group finishing our sort.

It's rather hard to explain in short so I recommend interested ones to read the GSACA paper for a formal prove or take a look at the pseudocode in my repo: https://github.com/akamiru/sort/wiki...tion-of-DAware.

5. ## The Following User Says Thank You to Christoph Diegelmann For This Useful Post:

Bulat Ziganshin (16th May 2017)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•