Eugene Shelwien further optimized his vectorizable constant-speed MTF implementation. Now, according to my tests it delivers constant speed of 114 MiB/s with AVX2 and 93 MiB/s with SSE2. The best-so-far implementation from BSC runs at 70 MiB/s on enwik9 and only 20 MiB/s on random data. All these measurements were taken on my Haswell i7-4770, with single thread running at 3.9 GHz.
encoding: mtf_gc70_SSE2.exe c infile outfile
decoding: mtf_gc70_SSE2.exe d infile outfile
Older MTF-related threads:
Meanwhile, i work on GPU implementation. I invite to this thread everyone who want to compete with us!