Results 1 to 2 of 2

Thread: Even (somewhat) faster Suffix Sorting

  1. #1
    Member
    Join Date
    Mar 2015
    Location
    Germany
    Posts
    57
    Thanks
    34
    Thanked 38 Times in 15 Posts

    Even (somewhat) faster Suffix Sorting

    Hello,

    I've played around a little and replaced libdivsufsort's trsort with basically radixSA
    (I actually use boost's spreadsort, std::sort and insertion sort instead of radix sort).
    Memory requirements should stay the same.

    Runtimes reported by suftest on my machine (i7 4770K, 16 GB RAM):
    Code:
    File   | orginal    | modified
    enwik8 |   8.1600 s |  7.8960 s
    enwik9 | 105.0930 s | 95.5340 s
    This is without OpenMP. OpenMP speeds up the unmodified sssort so the changes are a bit more significant:
    Code:
    File   | orginal    | modified
    enwik8 |   6.1880 s |  5.9630 s
    enwik9 |  83.1580 s | 74.6800 s
    Source code can be found at https://github.com/akamiru/libdivsufsort
    Main changes are in lib/spsort.cpp.

  2. The Following 2 Users Say Thank You to Christoph Diegelmann For This Useful Post:

    Bulat Ziganshin (19th September 2016),Cyan (19th September 2016)

  3. #2
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    Quote Originally Posted by Christoph Diegelmann View Post
    Hello,

    I've played around a little and replaced libdivsufsort's trsort with basically radixSA
    (I actually use boost's spreadsort, std::sort and insertion sort instead of radix sort).
    The 'tr' sort is designed for detection and specialized sorting of tandem repeats. Yuta's implementation suffers performance wise due to his choice of substring sort as the primary sort routine. The best way to speed up tandem repeat sort is by placing it in the main sort by switching from substring sort to multikey quicksort.

    Enwik8/9 are not good for testing tandem repeat sort in general because they don't really contain many tandem repeats

    Try your changes against the files in the gauntlet test set. The corpus is designed to test for perfornance on heavy tandem repeats as well as other hard cases for SACAs.

    http://www.michael-maniscalco.com/do...auntlet.tar.gz

Similar Threads

  1. Suffix tree transformation?
    By Kennon Conrad in forum Data Compression
    Replies: 92
    Last Post: 11th May 2015, 10:48
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 08:37
  3. Suffix Trees, Suffix Arrays, or What?
    By incblitz in forum Data Compression
    Replies: 47
    Last Post: 12th July 2011, 21:54
  4. Little question about Suffix Sorting techniques
    By Piotr Tarsa in forum Data Compression
    Replies: 0
    Last Post: 23rd May 2011, 20:01
  5. GCC 4.6 faster than previous GCC, faster than LLVM
    By willvarfar in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 15th November 2010, 16:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •