(I hope) I've implemented ST5 in OpenCL. As of now, CKE is not exploited (as I've written about here: http://encode.ru/threads/1274-Concur...mplementations ). I am not sure if I implemented it correctly.
I'll be glad if you point me to some paper describing ST5 in detail or write algorithm yourself. If you know a good toolchain that does various stagers of BWT based compression post it here.
Newest version of program is here: http://www63.zippyshare.com/v/93264665/file.html
If you provide it with one parameter then it generates random data, computes ST5 and writes output to file. It will treat that parameter as output file name.
If you provide it with two parameters, then first will be input file name and second will be output file name.
In either case algorithm processes exactly 16 MiB of data, so you will have best results if you provide it a 16 MiB file.
As for now it's slow as it's based on Bitonic Sort. It's not well suited to STx transforms as Bitonic Sort is unstable, so I had to append original index to each key, ie key consists of 5 bytes of input buffer + 3 bytes of input buffer position. Appending input position to key made Bitonic Sort stable. I'm planning to extend my program to compute full BWT transform, where stability of sorting is not required.
I hope algorithm is good and provides reasonable results