I'm looking for a good compressor that can take input in an alphabet larger than 256 symbols, maybe 1024, so that I can do preprocessing experiments without the weirdness of escape codes.
(I'm NOT trying to compress actual wide-character text data, and I know most strong compressors do pretty well on wide characters even if they treat them as strings of bytes.)
Something that takes 16-bit character input would be fine, if there's not a lot of loss if I only actually use 1024. (E.g., a pretty good LZ variant with Huffman-coded literal bytes and offsets would probably be fine.)
Does anybody else do this sort of thing, or do everybody's preprocessors detect unused byte values and use those for escapes, or what?