Results 1 to 11 of 11

Thread: Fast CRC table construction and rolling CRC hash calculation

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    Fast CRC table construction and rolling CRC hash calculation

    Inspired by Eugene Shelwien's fma-diff rolling CRC implementation, i've developed public domain implementation of the same idea, based on Igor Pavlov 7-zip code. This includes rolling CRC hash and fast generation of tables for plain and rolling CRC hash. Plain CRC table constructed 30-100x faster than with original algorithm, and rolling CRC table construction is 250-500 times faster. Hopefully it will be useful for your programs.

    /* crc.c -- Fast CRC table construction and rolling CRC hash calculation
    2009-11-23 : Igor Pavlov : Public domain
    2013-03-27 : Bulat.Ziganshin@gmail.com : Public domain */

    #include <stdio.h>

    #define kCrcPoly 0xEDB88320
    #define CRC_INIT_VAL 0 /* 0xFFFFFFFF for zip/rar/7-zip quasi-CRC */

    // For rolling CRC hash demo
    #define WINSIZE 100
    #define TESTSIZE 200

    // Fast CRC table construction algorithm
    void FastTableBuild (unsigned CRCTable[256], unsigned seed)
    {
    unsigned i, j, r;

    CRCTable[0] = 0;
    CRCTable[128] = r = seed;
    for (i=64; i; i/=2)
    CRCTable[i] = r = (r >> 1) ^ (kCrcPoly & ~((r & 1) - 1));

    for (i=2; i<256; i*=2)
    for (j=1; j<i; j++)
    CRCTable[i+j] = CRCTable[i] ^ CRCTable[j];
    }


    #define init_CRC() CRC_INIT_VAL
    #define update_CRC(crc,CRCTable,c) (CRCTable[((crc)^(c)) & 0xFF] ^ ((crc)>>8))
    #define finish_CRC(crc) ((crc) ^ CRC_INIT_VAL)

    unsigned calcCRC (unsigned char *buffer, unsigned len, unsigned CRCTable[256])
    {
    unsigned crc = init_CRC(), i;
    for (i=0; i<len; i++)
    crc = update_CRC(crc,CRCTable,buffer[i]);
    return finish_CRC(crc);
    }


    int main()
    {
    unsigned i, j, r, CRCTab[256], FastCRCTab[256], RollingCRCTab[256], crc1, crc2;

    // Fast CRC table construction
    FastTableBuild (FastCRCTab, kCrcPoly);

    // Classic CRC table construction algorithm
    for (i=0; i<256; i++)
    {
    r = i;
    for (j = 0; j < 8; j++)
    r = (r >> 1) ^ (kCrcPoly & ~((r & 1) - 1));
    CRCTab[i] = r;
    if (FastCRCTab[i] != CRCTab[i]) // unit testing :)
    printf("%02x %08x %08x\n", i, r, (r & 1) ? (r>>1)^kCrcPoly : (r>>1));
    }


    // Rolling CRC table construction
    for (i=0; i<256; i++)
    {
    unsigned crc = init_CRC();
    crc = update_CRC(crc,CRCTab,i);
    for (j=0; j<WINSIZE; j++)
    crc = update_CRC(crc,CRCTab,0);
    RollingCRCTab[i] = finish_CRC(crc);
    // printf("+%02x %08x %08x\n",i,r, (r & 1) ? (r>>1)^kCrcPoly : (r>>1));
    }

    // Fast table construction for rolling CRC
    FastTableBuild (FastCRCTab, RollingCRCTab[128]);
    for (i=0; i<256; i++)
    if (FastCRCTab[i] != RollingCRCTab[i]) // unit testing :)
    printf("*%02x %08x %08x\n",i,FastCRCTab[i],RollingCRCTab[i]);

    // Example of rolling CRC calculation and unit test simultaneously
    unsigned char buffer[WINSIZE+TESTSIZE];
    for (i=0; i<WINSIZE+TESTSIZE; i++)
    buffer[i] = 11 + i*31 + i/17; // random :)

    // Let's calc CRC(buffer+TESTSIZE,WINSIZE) in two ways
    crc1 = calcCRC(buffer+TESTSIZE,WINSIZE,CRCTab);
    crc2 = calcCRC(buffer,WINSIZE,CRCTab);
    for (i=0; i<TESTSIZE; i++)
    crc2 = update_CRC(crc2,CRCTab,buffer[WINSIZE+i]) ^ RollingCRCTab[buffer[i]];
    printf("%08x and %08x %s\n", crc1, crc2, crc1==crc2? "are equal":"ARE NOT EQUAL!");
    return 0;
    }


    Literature:
    [1] http://www.xmailserver.org/rabin.pdf
    [2] http://citeseer.ist.psu.edu/viewdoc/...10.1.1.53.6172
    [3] http://www.drdobbs.com/parallel/fast...sing/229401411
    Last edited by Bulat Ziganshin; 7th February 2015 at 02:22.

  2. The Following 3 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    schnaader (27th May 2018),Simorq (27th May 2018),SolidComp (30th May 2018)

  3. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    At last, a good reason to use the new "twitter button" for mass spreading

    By the way, here is the message we get when using it :
    Fast CRC table construction and rolling CRC hash calculation http://encode.ru/showpost.php?p=32727&postcount=1 via @YOUR-TWITTER-USERNAME
    Maybe there is one parameter to change in order to replace @YOUR-TWITTER-USERNAME ....

  4. #3
    Member
    Join Date
    Sep 2014
    Location
    Mem0ry
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Nice piece of work Bulat!! I am currently experimenting with rolling hashes and tried your implementation. Unfortunately it only worked with the #defined windows size of 100 and I was assuming it could be changed by simply modifying the preprocessor define.

    My solution was to use another function to calculate the RollingCRCTab table like this:

    Code:
    void FastTableBuild2 (unsigned CRCTable[256], unsigned FastCRCTable[256]) 
    { 
        unsigned i,c,x,y; 
        for(c=0;c<256;c++) 
        { 
            x=0; 
            y=0; 
            x=CRCTable[c]^(x>>8); 
            for(i=0;i<WINSIZE;i++) 
            { 
                x=CRCTable[(unsigned char) x]^(x>>8); 
                y=CRCTable[(unsigned char) y]^(y>>8); 
            } 
            FastCRCTable[c]=x^y; 
        } 
    }
    Don't know if it is fast but it works. Maybe you can speed it up?

  5. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    what you mean? the original code works ok for you? and then ou change only single line with WINSIZE define and it stops working? i ask you because i'm using this code in SREP with WINSIZE = 48 and trust me - it works

    PS: it seems that your CRC_INIT_VAL isn't 0. it's why my code doesn't work for you. for a rolling hash you don't need to use non-zero CRC_INIT_VAL value. the only meaning of non-zero here is to differentiate between byte sequence XYZ and byte sequence 000XYZ that isn't required when all byte sequences are of the same size
    Last edited by Bulat Ziganshin; 10th September 2014 at 07:44.

  6. #5
    Member
    Join Date
    Sep 2014
    Location
    Mem0ry
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    what you mean? the original code works ok for you? and then ou change only single line with WINSIZE define and it stops working? i ask you because i'm using this code in SREP with WINSIZE = 48 and trust me - it works

    PS: it seems that your CRC_INIT_VAL isn't 0. it's why my code doesn't work for you. for a rolling hash you don't need to use non-zero CRC_INIT_VAL value. the only meaning of non-zero here is to differentiate between byte sequence XYZ and byte sequence 000XYZ that isn't required when all byte sequences are of the same size
    you are right. the original code works with windows sizes other than 100. i made the mistake of removing the "conventional" way of calculating the CRC and the rolling CRC tables. I thought they were only there for testing and comparing, but the call to FastTableBuild to build the rolling CRC table the fast way, requires that the rolling table is already existent:
    Code:
    FastTableBuild (FastCRCTab, RollingCRCTab[128]);
    this is the code i ended up with:
    Code:
    /* crc.c -- Fast CRC table construction and rolling CRC hash calculation
    2009-11-23 : Igor Pavlov : Public domain
    2013-03-27 : Bulat.Ziganshin@gmail.com : Public domain */
     
     
    #include <stdio.h>
     
    #define kCrcPoly     0xEDB88320
    #define CRC_INIT_VAL 0 /* 0xFFFFFFFF for zip/rar/7-zip quasi-CRC */
     
      // For rolling CRC hash demo
    #define WINSIZE      48
    #define TESTSIZE     3
     
    // Fast CRC table construction algorithm
    void FastTableBuild (unsigned CRCTable[256], unsigned seed)
    {
      unsigned i, j, r;
     
     
      CRCTable[0]   = 0;
      CRCTable[128] = r = seed;
      for (i=64; i; i/=2)
        CRCTable[i] = r = (r >> 1) ^ (kCrcPoly & ~((r & 1) - 1));
     
     
      for (i=2; i<256; i*=2)
        for (j=1; j<i; j++)
          CRCTable[i+j] = CRCTable[i] ^ CRCTable[j];
    }
    
    #define BYTE (unsigned char)
    
    void FastTableBuild2 (unsigned CRCTable[256], unsigned FastCRCTable[256])
    {
        unsigned i,c,x,y;
    
        for(c=0;c<256;c++)
        {
            x=0;
            y=0;
            x=CRCTable[c]^(x>>8);
            for(i=0;i<WINSIZE;i++)
            {
                x=CRCTable[(unsigned char) x]^(x>>8);
                y=CRCTable[(unsigned char) y]^(y>>8);
            }
            FastCRCTable[c]=x^y;
        }
    }
    
    #define init_CRC()                  CRC_INIT_VAL
    #define update_CRC(crc,CRCTable,c)  (CRCTable[((crc)^(c)) & 0xFF] ^ ((crc)>>8))
    #define finish_CRC(crc)             ((crc) ^ CRC_INIT_VAL)
     
    unsigned calcCRC (unsigned char *buffer, unsigned len, unsigned CRCTable[256])
    {
      unsigned crc = init_CRC(), i;
      for (i=0; i<len; i++)
        crc = update_CRC(crc,CRCTable,buffer[i]);
      return finish_CRC(crc);
    }
     
    
    int main()
    {
      unsigned i, CRCTab[256], RollingCRCTab[256], crc1, crc2;
     
     
      // Fast CRC table construction
      FastTableBuild (CRCTab, kCrcPoly);
     
      // Fast table construction for rolling CRC
      FastTableBuild2 (CRCTab, RollingCRCTab);
     
      // Example of rolling CRC calculation and unit test simultaneously
      unsigned char buffer[WINSIZE+TESTSIZE];
      for (i=0; i<WINSIZE+TESTSIZE; i++)
        buffer[i] = 11 + i*31 + i/17;   // random :)
      
      // Let's calc CRC(buffer+TESTSIZE,WINSIZE) in two ways
      crc1 = calcCRC(buffer+TESTSIZE,WINSIZE,CRCTab);
      crc2 = calcCRC(buffer,WINSIZE,CRCTab);
      for (i=0; i<TESTSIZE; i++)
        crc2 = update_CRC(crc2,CRCTab,buffer[WINSIZE+i]) ^ RollingCRCTab[buffer[i]];
      printf("%08x and %08x %s\n", crc1, crc2, crc1==crc2? "are equal":"ARE NOT EQUAL!");
    
      return 0;
    }
    maybe in your real world code, you already know the value of RollingCRCTab[128] for your specific window size. doing it first the slow way and then the fast way doesn't really make sense, right?

    Most likely the following code from your original posting would be even faster than my FastTableBuild2 and correctly use the WINDOW_SIZE:
    Code:
    for (i=0; i<256; i++)  {
    
        unsigned crc = init_CRC();
    
        crc = update_CRC(crc,CRCTab,i);
    
        for (j=0; j<WINSIZE; j++)
    
          crc = update_CRC(crc,CRCTab,0);
    
        RollingCRCTab[i] = finish_CRC(crc);
    
    //    printf("+%02x %08x %08x\n",i,r, (r & 1) ? (r>>1)^kCrcPoly : (r>>1));
    
      }
    but your code provided a good example and the table creation shouldn't be a performance problem anyways as it only happens once usually.
    Last edited by Default; 10th September 2014 at 13:42.

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it's performed in the following way:


    // Fast CRC table construction
    FastTableBuild (CRCTab, kCrcPoly);

    // Rolling CRC table construction
    i=128;
    {
    unsigned crc = init_CRC();
    crc = update_CRC(crc,CRCTab,i);
    for (j=0; j<WINSIZE; j++)
    crc = update_CRC(crc,CRCTab,0);
    RollingCRCTab[i] = finish_CRC(crc);
    }

    // Fast table construction for rolling CRC
    FastTableBuild (RollingCRCTab, RollingCRCTab[128]);

    you need only one value FastCRCTab[128] aka RollingCRCTab[128] and remaining values are calculated based on it. it's why my code is 256 times faster than your - the first loop is the same (y is always 0 in your code), but i execute it only once with i=128. the first cycle can also be made faster using fast power algorithm (it's just kCrcPoly**WINSIZE in the Galois field). there is google crc library that makes a lot of amazing things with those crcs

    i'm sorry - this code is really hard to follow. you may download real working SREP code, that includes CrcRollingHash class, at http://freearc.org/research/SREP39.htm
    Last edited by Bulat Ziganshin; 10th September 2014 at 14:11.

  8. #7
    Member
    Join Date
    Sep 2014
    Location
    Mem0ry
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thx for the enlightenment. this way you don't have to calc the entire rolling CRC table but instead only the element needed for your fast algorithm. cool!

  9. #8
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    500% faster?!?!?!!!?!
    Nice job, then!

  10. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i got the question how to build rolling CRC tables for non-zero CRC_INIT_VAL. unfortunately, it can't use fast algorithm. instead use slow one:
    Code:
    void BuildRollingCRCTable (const unsigned CRCTable[256], unsigned RollingCRCTable[256]) 
    { 
        unsigned i,c,x,y; 
        for(c=0;c<256;c++) 
        { 
            x = init_CRC(); 
            y = init_CRC(); 
            x = update_CRC(x,CRCTable,c);
            for(i=0;i<WINSIZE;i++) 
            { 
                x = update_CRC(x,CRCTable,0);
                y = update_CRC(y,CRCTable,0);
            } 
            RollingCRCTable[c] = finish_CRC(x) ^ finish_CRC(y); 
        } 
    }
    of course, "y" can be calculated just once. if you have large WINSIZE (>=512) you can speedup calculation of "x" using power_in_galois_field function, implemented f.e. in the google crc library


    Alternatively, you can use IncWinTab_Init from Eugene Shelwien's fma-diff rolling CRC implementation
    Last edited by Bulat Ziganshin; 6th February 2015 at 18:41.

  11. #10
    Member
    Join Date
    Feb 2015
    Location
    Czech Republic
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Thumbs up

    Quote Originally Posted by Bulat Ziganshin View Post
    i got the question how to build rolling CRC tables for non-zero CRC_INIT_VAL
    Thank you so much! It works perfectly.

  12. #11
    Member
    Join Date
    May 2018
    Location
    Portland, OR USA
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I did some debugging on the code here to get it to work for the CRC_INIT_VAL != 0 case. http://github.com/BartMassey/rolling-crc . Thanks huge for providing this codebase, it has been really helpful.

Similar Threads

  1. Hash / Checksum algorithm considerations
    By Cyan in forum Data Compression
    Replies: 61
    Last Post: 16th June 2017, 00:28
  2. "Implementing ppmc with hash tables"
    By RichSelian in forum Data Compression
    Replies: 8
    Last Post: 11th March 2013, 01:37
  3. Directory hash as one string
    By FatBit in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 17th January 2012, 00:29
  4. Fastest non-secure hash function!?
    By Sanmayce in forum Data Compression
    Replies: 13
    Last Post: 20th November 2010, 20:54
  5. Hash Zip
    By Black_Fox1 in forum Forum Archive
    Replies: 6
    Last Post: 4th March 2007, 18:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •