1. BWT suffix arrays
cbloom seems happy about it, but he keeps talking about 128k windows,
and at that level plain hash chains is imho the best solution.
Anyway, I've got disappointed after noticing the fact that computing
BWT of the whole file is not a solution, and to link up SA's of smaller
blocks we'd need additional pointers, so it would end up as 8N.
Btw, I don't really understand cblooms ideas, but my approach is
to compute BWT of overlapped blocks
(0..2*winsize, winsize..3*winsize, ...), so that after reaching
offset 2*winsize, we'd compute the BWT of winsize..3*winsize block.
Which would work and allow to find matches in a window, but its still 8N.
2. Refinement of NMF ( http://encode.ru/threads/1278-Nibble-MMC-matchfinder )
Basically its just a real tree, unlike MMC where collisions are allowed,
Merging the text buffer with the tree appeared slow, also afair Cyan(?) posted
good ideas about nibble/byte symbol switching depending on alphabet size, also
lazy insertion (add strings to hash chain first, only sort on lookups).
But at best this would be more stable than bt4 in redundant cases, however
we can't expect any significant speed difference on normal files.
3. Suffix trees
Suffix pointers are really nice, and there's the idea about bitmask compression
of tree nodes ( http://encode.ru/threads/1173-New-da...for-bitwise-CM )
but managing memory allocation gets very hard, and overall consumption is much higher
than 8N, so is there really a reason to implement this?
(for LZ; for CM/PPM there're less alternatives, because we have to store stats somewhere)
Another problem is window maintenance - surely its possible to delete nodes at
the other end of the window, but that requires adding precise context counts to nodes
(memory!) and makes parsing 2x slower.
So working with overlapping double windows seems like a more convenient solution
here too, but then BWT really seems nicer.
4. Compressed match history
LZMA matchfinder finds the nearest substring occurrence per length, so
if we'd simply dump its output, we'd get something like this:
The idea is to directly compress (dedup) this table and link it up.
offset: match offset array
1F03: 1E35 1E35
1F04: 1E36 1E36 1E36
1F05: 1E37 1E37 1E37 1E37
1F06: 1E38 1E38 1E38 1E38 1E38
1F07: 1E39 1E39 1E39 1E39 1E39 1E39
1F08: 1E3A 1E3A 1E3A 1E3A 1E3A 1E3A 1E3A
1F09: 1E3B 1E3B 1E3B 1E3B 1E3B 1E3B 1E3B 1E3B
1F0A: 1E3C 1E3C 1E3C 1E3C 1E3C 1E3C 1E3C 1E3C 1E3C
1F0B: 1E3D 1E3D 1E3D 1E3D 1E3D 1E3D 1E3D 1E3D 1E3D 1E3D
1F0C: 1E55 1E3E 1E3E 1E3E 1E3E 1E3E 1E3E 1E3E 1E3E 1E3E 1E3E
1F0D: 1EC5 09E4
2702: 1F05 1F05
2703: 2649 1F06 1F06
2704: 1F07 1F07 1F07 1F07
2705: 1F08 1F08 1F08 1F08 1F08
2706: 1F09 1F09 1F09 1F09 1F09 1F09
2707: 1F0A 1F0A 1F0A 1F0A 1F0A 1F0A 1F0A
2708: 1F0B 1F0B 1F0B 1F0B 1F0B 1F0B 1F0B 1F0B
2709: 2600 16B9 16B9 16B9 16B9
270A: 2601 2601 16BA 16BA 16BA 16BA
270B: 2602 2602 2602 16BB 16BB 16BB 16BB
270C: 1C52 1C52 1C52
270D: 265A 1C53 1C53 1C53
270E: 2552 23F9 1C54 1C54 1C54
270F: 2519 2519 23FA 1C55 1C55 1C55
One benefit is that such a matchfinder would be able to provide to
main engine the (compressed) distance arrays for whole blocks
(while working in its own thread).
5. Direct hashing and bloom filters ( http://en.wikipedia.org/wiki/Bloom_filter )
Recently I tested the uncached RAM read speed after a long while,
and was really impressed by the results (600-700 clocks).
Btw, 600 clocks at 3.52Ghz is ~170ns, which is actually not that much,
taking into account page directory lookups on TLB miss.
Anyway, so basically it doesn't matter how much computing we do,
while we don't access any uncached memory.
Thus the following idea: just implement direct hashtable lookups
not only for o2/o3/o4, but all the way up to, like, o12.
But plain direct lookups would be obviously slow, so here's the
quirk: use small bloom filter hashtables to exclude missing contexts,
which would be frequent for higher orders.