Results 1 to 6 of 6

Thread: Super-Scalar RAM-CPU Cache Compression

  1. #1
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts

    Post Super-Scalar RAM-CPU Cache Compression

    Hi folks! I've found a document I think worth share here. It is about three codecs reaching *VERY* high speed in both compression and decompression while maintaining a good ratio.
    One can talk a lot about the details, but the document is pretty concise itself. Just to explain why it caught my attention, please see this table:
    Click image for larger version. 

Name:	Comparison.jpg 
Views:	308 
Size:	119.5 KB 
ID:	3385

    What I cannot say is if they will serve as general purpose algorithms. What do you think?

    Here is the source:
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts

    Exclamation Integer Compression

    These compression schemes (PFor, PForDelta,...) are very popular in the database and specially in the information retrieval field.
    You can google at "Integer Compression". These schemes are more suited for compressing lists of integers (Integer array).
    1 - papers:
    - Decoding billions of integers per second through vectorization: http://arxiv.org/abs/1209.2137
    - SIMD Compression and the Intersection of Sorted Integers: http://arxiv.org/abs/1401.6399


    2 - Libraries:
    - TurboPFOR (written by me) https://github.com/powturbo/TurboPFor (C/C++)
    - FastPFor: https://github.com/lemire/FastPFOR (C++)
    - JavaFastPFor: https://github.com/lemire/JavaFastPFOR (Java)

  3. The Following User Says Thank You to dnd For This Useful Post:

    Gonzalo (1st February 2015)

  4. #3
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Oh thank you man! I didn't notice that before. But I have a somehow naive question: as we can consider any file as a succession of integers from 0 to 256, why these algorithms doesn't come around end-user programs? What's the difference? Thanks in advance!

  5. #4
    Member
    Join Date
    Jan 2015
    Location
    chennai,Tamilnadu,India
    Posts
    19
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Thanks for info

    I have seen the codes for Integer Compression, I want the code for PDICT as I am trying to compress strings. PDICT is faster and better algorithm in case of string compression, please tell me if anyone find any implementation of PDICT.

  6. #5
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    456
    Thanks
    46
    Thanked 164 Times in 118 Posts
    "Integer compression" is not for general purpose data compression,
    is depending on the type of integer data, and is better applicable for integer lists where
    most integers are small. Ex. compressing the deltas of a sorted integer array.
    PDICT is simply compressing the indexes (as integer lists) of low cardinality string dictionaries (ex. job category in databases ).
    The dictionary is stored separately from the compressed string indexes.

  7. #6
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Quote Originally Posted by dnd View Post
    "Integer compression" is not for general purpose data compression,
    is depending on the type of integer data, and is better applicable for integer lists where
    most integers are small. Ex. compressing the deltas of a sorted integer array.
    PDICT is simply compressing the indexes (as integer lists) of low cardinality string dictionaries (ex. job category in databases ).
    The dictionary is stored separately from the compressed string indexes.
    In-memory data representations are often more similar to integer data than to text, and integer-oriented compression often works very well for "general purpose" data as they're actually represented in memory. (As opposed to how they're written to files.)

    The WKdm algorithm (used in Mac OS X's "compressed memory") is basically integer-oriented, assuming that data in memory are at least 4-byte aligned and that aligned 4-byte chunks of data are often numerically similar (or identical) to other very-recently-seen 4-byte chunks of data. It's simple and fast, and works surprisingly well for various kinds of data, but not well at all for text or x86 machine code. (It works pretty well for RISC code, especially if you tweak it a little bit as Apple did for the Newton's ARM RISC code.)

    One reason this works is that pointers are just integers, really, and tend to be numerically similar to other nearby pointers. The compression algorithm doesn't have to know which numbers are "really" integers and which ones are really pointers or array offsets or whatever. It also doesn't have to know when it's looking at 8-byte data---the top halves of 8-byte integers and pointers are often very compressible and looking at 8-byte values as two 4-byte values doesn't hurt much. (Which is why Apple didn't need to change WKdm for 64-bit machines.)

    https://www.usenix.org/legacy/event/...son/wilson.pdf

    All WKdm really does is keep a 16-element hash table of recently-seen 4-byte values, and notice whether each 4-byte value it sees is an exact or partial match. (High 22 bits of 32.) It's really crude and stupid, if I do say so myself, and what I think is most interesting about it is that it works anyway---in-memory data representations are usually VERY sparse and exhibit VERY strong locality properties. (Analogous to spatial locality exploited by memory hierarchies.)

    An interesting variant of WKdm called X-Match hashes on the first and third bytes, skipping the byte in between and the least significant byte. That lets it get more compression on 16-bit data as well. (E.g. 16-bit Unicode characters, where the high bytes of aligned byte pairs are usually identical, as well as actual 16-bit integers like uncompressed audio, where the high bytes of aligned byte pairs have VERY strong locality.)

Similar Threads

  1. Super Sound Slimmer
    By kampaster in forum Data Compression
    Replies: 2
    Last Post: 10th July 2010, 06:10
  2. QuikCAT Miliki Super Compressor
    By osmanturan in forum Data Compression
    Replies: 2
    Last Post: 2nd November 2008, 19:29
  3. super-fast i/o
    By Bulat Ziganshin in forum Forum Archive
    Replies: 18
    Last Post: 10th May 2007, 22:08

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •