Page 1 of 3 123 LastLast
Results 1 to 30 of 67

Thread: FARSH: hashing 30 GB/s!

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts

    FARSH: hashing 30 GB/s!

    FARSH stands for fast and reliable (but not secure) hash, and I just released version 0.1

    I'm interested in results of running benchmark executables (farsh*.exe) on your computers and, of course, in your versions with better bit mixing.

  2. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (29th May 2015),VoLT (29th May 2015)

  3. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    I'm working on my side on an AVX2 version of xxHash.
    It's not published yet because, as farsh 0.1, its quality is not yet good enough.

    xxh_avx is quite fast, but not as fast as farsh numbers.
    Plus, I like its uhash-like ability to generate any key size up to 1024 bits.
    So it's a pretty good start.

  4. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    The universal hashing formula used here (and derived from UMAC) is as simple as

    for (i=0; i < elements; i+=2)
    sum += (data[i] + key[i]) * (ULONG)(data[i+1] + key[i+1]);
    Universal hashing mentions shifting result to the right.

    I've made an experiment and made histograms of multiplication results after shifting to the right. Here's Scala code:
    Code:
    object Main {
      val factorSize = 8
      val factorsNum = 1 << factorSize
      val factors = 0 until factorsNum
      val shifts = 0 to factorSize
      val resultSize = factorSize
      val resultsNum = 1 << resultSize
      val resultMask = (1 << resultSize) - 1
    
      val countsForShifts = Array.ofDim[Int](resultsNum, shifts.length)
    
      val ideal = factorsNum * factorsNum / resultsNum
    
      def tabS(s: String): String = f"$s%12s "
      def tabI(i: Int): String = tabS(i.toString)
      def countErrorsSum(shift: Int): Int = {
        countsForShifts.map(row => math.abs(ideal - row(shift))).sum
      }
    
      def main(args: Array[String]): Unit = {
        for (factor1 <- factors; factor2 <- factors; shift <- shifts) {
          val result = ((factor1 * factor2) >> shift) & resultMask
          countsForShifts(result)(shift) += 1
        }
        println(tabS("Value") + tabS("Ideal") + shifts.map(tabI).mkString)
        println("Rows:")
        for ((countsRow, i) <- countsForShifts.zipWithIndex) {
          println(tabI(i) + tabI(ideal) + countsRow.map(tabI).mkString)
        }
        println("Total error sums:")
        println(tabS("") + tabI(0) + shifts.map(shift =>
          tabI(countErrorsSum(shift))).mkString)
        val bestShift = shifts.minBy(countErrorsSum)
        val timesBetter = countErrorsSum(0).toDouble / countErrorsSum(bestShift)
        println(s"Best shift is $bestShift by $timesBetter times than default one")
      }
    }
    The output is:
    Code:
           Value        Ideal            0            1            2            3            4            5            6            7            8 
    Rows:
               0          256         1280          894          765          730          729          767          905         1226         1968 
               1          256          128          208          224          249          276          366          528          910         1301 
               2          256          256          268          279          288          318          366          549          713         1173 
               3          256          128          186          226          251          305          388          566          757         1100 
               4          256          384          334          291          268          271          364          441          659         1032 
               5          256          128          204          244          282          328          421          495          693          976 
               6          256          256          272          277          287          315          377          458          622          937 
               7          256          128          206          240          284          330          420          489          640          909 
               8          256          512          362          303          283          308          336          440          593          870 
               9          256          128          184          218          233          277          329          441          614          841 
              10          256          256          268          279          309          339          355          437          562          821 
              11          256          128          190          248          277          330          360          448          575          784 
              12          256          384          330          299          276          299          348          412          551          772 
              13          256          128          196          250          282          308          339          411          544          750 
              14          256          256          256          249          271          318          355          421          530          734 
              15          256          128          186          246          289          357          377          445          552          732 
              16          256          640          458          365          350          296          328          377          499          696 
              17          256          128          184          204          214          271          320          396          522          682 
              18          256          256          260          263          252          249          318          407          504          674 
              19          256          128          214          248          295          321          357          417          499          654 
              20          256          384          334          305          288          285          326          366          472          646 
              21          256          128          204          256          307          322          337          397          510          636 
              22          256          256          264          271          284          302          361          396          463          623 
              23          256          128          206          232          266          290          303          366          466          602 
              24          256          512          378          319          303          313          338          393          481          609 
              25          256          128          184          204          252          273          298          361          449          580 
              26          256          256          252          257          276          278          322          375          444          584 
              27          256          128          214          250          274          289          320          367          457          565 
              28          256          384          306          273          296          331          329          375          445          562 
              29          256          128          188          226          232          242          306          346          434          549 
              30          256          256          264          271          314          336          353          392          439          550 
              31          256          128          214          266          307          292          322          361          447          549 
              32          256          768          518          397          292          303          307          340          404          513 
              33          256          128          200          256          260          266          283          348          431          528 
              34          256          256          252          253          281          305          318          356          407          497 
              35          256          128          190          218          228          246          288          352          420          517 
              36          256          384          326          301          304          337          337          349          405          496 
              37          256          128          196          210          216          236          312          355          413          493 
              38          256          256          256          265          273          278          301          336          388          475 
              39          256          128          198          240          253          297          326          349          407          478 
              40          256          512          390          315          287          283          294          334          391          473 
              41          256          128          192          236          251          272          284          323          388          467 
              42          256          256          252          265          267          265          290          339          387          455 
              43          256          128          194          244          276          294          320          359          385          456 
              44          256          384          318          293          292          306          293          303          361          437 
              45          256          128          212          268          289          326          336          354          405          457 
              46          256          256          256          263          273          268          294          313          359          429 
              47          256          128          178          226          252          262          291          337          370          433 
              48          256          640          450          369          310          280          290          328          379          425 
              49          256          128          192          210          229          259          283          319          362          420 
              50          256          256          252          267          276          269          276          306          354          415 
              51          256          128          194          238          256          286          312          335          359          405 
              52          256          384          314          301          290          299          316          314          351          405 
              53          256          128          188          232          248          258          287          306          360          399 
              54          256          256          264          271          270          274          292          338          352          399 
              55          256          128          198          232          251          254          273          298          339          381 
              56          256          512          382          323          315          301          297          311          340          387 
              57          256          128          200          250          265          271          293          318          351          383 
              58          256          256          268          267          268          295          292          303          328          375 
              59          256          128          186          212          212          249          265          303          344          373 
              60          256          384          334          317          294          305          318          311          331          371 
              61          256          128          196          242          278          280          290          310          335          367 
              62          256          256          264          277          254          254          274          305          335          365 
              63          256          128          198          262          258          278          281          311          335          365 
              64          256          896          582          347          269          250          264          284          315          339 
              65          256          128          200          246          303          308          306          309          321          345 
              66          256          256          260          249          239          252          278          301          315          345 
              67          256          128          206          212          236          250          264          288          323          351 
              68          256          384          314          289          282          277          292          280          308          323 
              69          256          128          204          248          275          272          278          302          306          327 
              70          256          256          264          251          243          267          288          300          323          339 
              71          256          128          190          220          237          242          266          282          304          331 
              72          256          512          390          329          314          302          273          290          300          311 
              73          256          128          200          242          270          268          280          293          317          327 
              74          256          256          252          241          240          264          280          283          296          313 
              75          256          128          186          210          239          270          286          298          306          313 
              76          256          384          322          301          282          265          250          276          285          305 
              77          256          128          188          214          246          269          292          279          298          301 
              78          256          256          256          267          275          289          279          295          302          305 
              79          256          128          194          208          245          264          270          274          279          299 
              80          256          640          434          327          289          250          253          277          292          294 
              81          256          128          192          220          244          262          281          294          301          289 
              82          256          256          276          269          278          268          266          263          280          295 
              83          256          128          190          226          231          240          272          293          286          281 
              84          256          384          322          295          282          265          263          256          277          295 
              85          256          128          188          214          236          253          257          271          275          267 
              86          256          256          256          249          257          269          282          291          291          285 
              87          256          128          198          240          242          269          285          266          265          271 
              88          256          512          390          333          310          259          250          265          275          268 
              89          256          128          192          238          250          253          234          256          267          265 
              90          256          256          260          263          287          303          300          290          272          267 
              91          256          128          198          228          259          269          282          282          286          271 
              92          256          384          318          283          268          260          259          247          253          253 
              93          256          128          188          220          227          240          235          260          270          257 
              94          256          256          264          257          278          283          283          278          265          260 
              95          256          128          198          232          235          226          249          256          261          249 
              96          256          768          534          399          283          264          256          260          267          247 
              97          256          128          200          220          239          252          272          273          252          247 
              98          256          256          252          243          257          273          270          256          258          241 
              99          256          128          178          220          245          250          251          260          257          244 
             100          256          384          342          299          263          250          247          247          246          231 
             101          256          128          188          228          234          246          254          266          264          249 
             102          256          256          248          251          254          275          271          256          238          225 
             103          256          128          186          204          226          238          255          250          253          231 
             104          256          512          398          349          320          295          263          250          238          224 
             105          256          128          200          216          229          240          255          266          256          233 
             106          256          256          268          279          280          281          260          269          259          231 
             107          256          128          206          234          237          243          232          231          222          209 
             108          256          384          342          319          287          256          270          256          258          227 
             109          256          128          196          228          255          276          275          256          230          206 
             110          256          256          248          253          253          235          238          237          233          215 
             111          256          128          202          220          227          252          245          252          239          211 
             112          256          640          442          339          277          242          243          241          233          223 
             113          256          128          192          248          259          255          264          245          232          198 
             114          256          256          260          253          250          248          240          245          235          207 
             115          256          128          202          238          260          264          259          249          227          197 
             116          256          384          318          279          265          246          270          258          237          205 
             117          256          128          204          236          255          269          231          217          218          192 
             118          256          256          248          225          223          235          251          249          236          207 
             119          256          128          170          196          214          222          232          242          216          181 
             120          256          512          398          349          306          289          263          252          237          204 
             121          256          128          208          216          248          262          244          229          212          181 
             122          256          256          268          269          241          239          251          244          218          189 
             123          256          128          206          268          276          275          250          237          230          187 
             124          256          384          314          259          255          252          259          238          211          182 
             125          256          128          204          218          248          251          242          239          227          187 
             126          256          256          272          263          270          256          238          235          207          179 
             127          256          128          190          192          212          207          243          242          228          187 
             128          256         1024          514          337          259          250          238          228          207          168 
             129          256          128          176          230          256          263          257          226          205          169 
             130          256          256          244          259          247          232          233          236          223          179 
             131          256          128          198          252          266          266          257          231          190          162 
             132          256          384          306          255          253          245          221          224          222          175 
             133          256          128          180          214          245          258          248          220          192          161 
             134          256          256          240          249          262          243          230          229          201          158 
             135          256          128          178          202          211          230          255          243          226          173 
             136          256          512          406          339          286          256          231          208          180          151 
             137          256          128          200          256          278          270          241          234          210          162 
             138          256          256          244          259          252          256          252          226          201          161 
             139          256          128          194          222          226          220          232          216          187          145 
             140          256          384          310          263          247          256          238          224          203          158 
             141          256          128          188          216          225          242          228          231          198          151 
             142          256          256          256          245          228          210          218          214          191          145 
             143          256          128          198          232          248          264          251          230          201          154 
             144          256          640          438          353          303          256          241          213          189          139 
             145          256          128          200          238          257          238          231          210          186          145 
             146          256          256          252          259          259          271          255          242          200          144 
             147          256          128          170          206          210          205          218          210          186          141 
             148          256          384          306          277          263          253          229          212          185          133 
             149          256          128          180          190          202          221          222          213          188          136 
             150          256          256          248          259          277          279          271          233          194          143 
             151          256          128          178          202          223          235          227          219          184          127 
             152          256          512          390          323          276          226          201          180          166          132 
             153          256          128          200          242          253          259          257          227          192          133 
             154          256          256          260          245          257          260          235          228          176          127 
             155          256          128          170          212          229          232          208          198          179          126 
             156          256          384          334          309          277          249          243          215          184          129 
             157          256          128          196          228          247          238          238          211          172          123 
             158          256          256          248          259          245          236          232          216          182          116 
             159          256          128          170          196          232          220          210          194          169          121 
             160          256          768          506          385          301          269          251          230          189          123 
             161          256          128          184          210          201          209          219          194          158          110 
             162          256          256          260          265          256          252          231          214          175          117 
             163          256          128          194          236          243          234          221          202          162          110 
             164          256          384          314          289          267          247          233          204          178          121 
             165          256          128          188          228          235          243          222          211          168          109 
             166          256          256          256          245          242          250          233          206          167          108 
             167          256          128          186          210          222          223          227          200          162          103 
             168          256          512          378          311          260          250          218          200          180          120 
             169          256          128          192          230          244          237          224          215          168           99 
             170          256          256          260          245          242          253          228          181          140           97 
             171          256          128          190          210          213          205          218          208          172          100 
             172          256          384          322          289          279          254          219          181          160          105 
             173          256          128          172          190          202          220          236          228          167          101 
             174          256          256          256          243          230          227          206          181          148           94 
             175          256          128          206          236          237          251          218          195          164           93 
             176          256          640          446          365          309          242          226          200          154           94 
             177          256          128          192          220          234          243          240          202          154           95 
             178          256          256          260          263          251          228          220          195          159           95 
             179          256          128          190          196          199          194          201          191          147           86 
             180          256          384          326          297          275          250          219          204          165           89 
             181          256          128          196          242          265          259          237          186          140           82 
             182          256          256          248          267          277          268          240          207          160           89 
             183          256          128          186          218          222          226          214          185          150           91 
             184          256          512          386          311          272          235          200          196          146           78 
             185          256          128          184          212          226          239          219          181          143           79 
             186          256          256          244          255          247          225          223          198          147           80 
             187          256          128          198          206          209          188          192          180          147           79 
             188          256          384          306          289          271          249          233          196          150           78 
             189          256          128          188          232          259          237          224          191          146           81 
             190          256          256          248          241          223          249          229          196          142           73 
             191          256          128          186          200          197          204          182          170          142           70 
             192          256          896          570          343          298          265          249          211          139           73 
             193          256          128          184          196          208          205          192          181          146           68 
             194          256          256          252          237          230          246          225          168          129           69 
             195          256          128          178          206          215          223          209          197          149           76 
             196          256          384          326          317          313          287          234          192          134           67 
             197          256          128          180          190          206          206          198          182          136           61 
             198          256          256          248          247          236          205          210          166          132           62 
             199          256          128          194          234          240          234          208          187          140           61 
             200          256          512          378          309          277          258          221          182          123           64 
             201          256          128          184          180          195          201          196          180          135           59 
             202          256          256          260          245          227          201          194          168          129           60 
             203          256          128          198          216          222          230          228          200          146           65 
             204          256          384          318          289          283          264          235          182          122           53 
             205          256          128          196          216          211          217          200          164          128           54 
             206          256          256          256          263          262          235          208          186          124           51 
             207          256          128          190          210          210          213          186          168          131           52 
             208          256          640          462          363          306          252          225          193          132           53 
             209          256          128          192          234          257          237          205          158          115           48 
             210          256          256          236          233          239          234          224          192          135           51 
             211          256          128          194          216          224          224          201          167          120           48 
             212          256          384          318          275          259          239          224          186          138           53 
             213          256          128          196          236          219          202          201          175          114           45 
             214          256          256          256          245          222          212          192          164          118           42 
             215          256          128          186          222          237          198          193          158          111           41 
             216          256          512          378          305          275          284          235          201          127           40 
             217          256          128          192          212          209          197          188          165          119           41 
             218          256          256          252          259          252          237          209          166          118           38 
             219          256          128          186          206          226          230          212          172          106           39 
             220          256          384          322          287          255          238          215          172          124           36 
             221          256          128          196          222          234          216          178          155          108           37 
             222          256          256          248          237          243          247          203          179          122           36 
             223          256          128          186          202          206          194          204          155          105           35 
             224          256          768          490          355          256          242          214          181          125           38 
             225          256          128          184          210          204          193          197          161          113           31 
             226          256          256          260          263          250          246          223          176          116           29 
             227          256          128          206          222          236          226          188          158           96           28 
             228          256          384          298          263          226          222          212          170          116           27 
             229          256          128          196          230          227          203          177          156          104           26 
             230          256          256          264          263          251          219          210          179          108           25 
             231          256          128          198          242          247          234          193          144          101           24 
             232          256          512          370          305          277          268          226          180          113           23 
             233          256          128          184          214          220          223          203          159          103           22 
             234          256          256          244          235          231          197          181          156          102           21 
             235          256          128          178          208          230          216          178          154          100           20 
             236          256          384          298          251          230          228          211          167          108           19 
             237          256          128          188          210          210          202          192          165          108           18 
             238          256          256          264          265          256          254          216          156           94           17 
             239          256          128          182          214          212          197          196          155           95           16 
             240          256          640          454          335          288          218          192          170          109           15 
             241          256          128          192          218          226          231          208          166          102           14 
             242          256          256          252          241          243          205          185          144           91           13 
             243          256          128          182          224          245          234          194          154           96           12 
             244          256          384          322          275          254          251          221          171          102           11 
             245          256          128          180          186          188          192          182          142           92           10 
             246          256          256          264          261          238          205          179          154           96            9 
             247          256          128          214          250          265          246          219          170           95            8 
             248          256          512          370          297          247          225          186          148           99            7 
             249          256          128          176          218          217          226          195          150           86            6 
             250          256          256          244          233          220          203          192          159           95            5 
             251          256          128          178          210          235          228          201          161           94            4 
             252          256          384          326          287          260          212          192          143           87            3 
             253          256          128          180          204          195          204          189          154           93            2 
             254          256          256          240          243          221          199          178          146           90            1 
             255          256          128          194          242          273          287          243          179           97            0 
    Total error sums:
                            0        32768        16832         8970         6498         7128        10694        17826        29832        48550 
    Best shift is 3 by 5.042782394582948 times than default one
    Each column is for different shift. Shift 0 is the default one and it's the shift that Bulat used (shift 0 is no shift).


    If I change factorSize and comment out printing of rows, ie:
    Code:
    object Main {
      val factorSize = 15
      val factorsNum = 1 << factorSize
      val factors = 0 until factorsNum
      val shifts = 0 to factorSize
      val resultSize = factorSize
      val resultsNum = 1 << resultSize
      val resultMask = (1 << resultSize) - 1
    
      val countsForShifts = Array.ofDim[Int](resultsNum, shifts.length)
    
      val ideal = factorsNum * factorsNum / resultsNum
    
      def tabS(s: String): String = f"$s%12s "
      def tabI(i: Int): String = tabS(i.toString)
      def countErrorsSum(shift: Int): Int = {
        countsForShifts.map(row => math.abs(ideal - row(shift))).sum
      }
    
      def main(args: Array[String]): Unit = {
        for (factor1 <- factors; factor2 <- factors; shift <- shifts) {
          val result = ((factor1 * factor2) >> shift) & resultMask
          countsForShifts(result)(shift) += 1
        }
        println(tabS("Value") + tabS("Ideal") + shifts.map(tabI).mkString)
    //    println("Rows:")
    //    for ((countsRow, i) <- countsForShifts.zipWithIndex) {
    //      println(tabI(i) + tabI(ideal) + countsRow.map(tabI).mkString)
    //    }
        println("Total error sums:")
        println(tabS("") + tabI(0) + shifts.map(shift =>
          tabI(countErrorsSum(shift))).mkString)
        val bestShift = shifts.minBy(countErrorsSum)
        val timesBetter = countErrorsSum(0).toDouble / countErrorsSum(bestShift)
        println(s"Best shift is $bestShift by $timesBetter times than default one")
      }
    }
    Then I get the following:
    Code:
           Value        Ideal            0            1            2            3            4            5            6            7            8            9           10           11           12           13           14           15 
    Total error sums:
                            0    536870912    269083040    135153166     68433600     35241912     19274406     13546862     13617772     18553286     30496378     53392724     94561504    166028768    286628836    483389300    790058434 
    Best shift is 6 by 39.63064745178625 times than default one
    With growing factor size there are growing differences. They certainly will cause problems when there's only one pair of inputs to hash and the shift is bad. I don't know wheter the spread would be bad if there are many pairs to be hashed, but there's SMHasher and its reports.

  5. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Piotr: yes, the hash value produced directly by this formula, should be shifted by 32 for correct result, since lower bits aren't 100% random. but since we are using this value only as entropy source, saving entire 64 bits can't make results worser. from the practical grounds, my first version saved only higher 32 bits and i've quickly discovered that similar short source data (such as {1} and {2}) leads to the similar hashes

    uhash saves entire 64 bit results for 1024-byte blocks, concatenates them and then runs more complex polynomial hashing over these data. i just tried to calculate crc32 of the same data (sequence of 64-bit block hashes) and it made much better smhasher results - now only Avalanche and Zeroes tests are fail

  6. #5
    Member
    Join Date
    Aug 2007
    Location
    Rzeczpospolita
    Posts
    10
    Thanks
    0
    Thanked 2 Times in 1 Post
    i7-4770/16Gb/W7C x64
    farsh-x86.exe | Hashing 100 GiB (b004ee2e): 16754.775 milliseconds = 6.409 GB/s = 5.968 GiB/s
    farsh-x86-sse2.exe | Hashing 100 GiB (b004ee2e): 3448.710 milliseconds = 31.135 GB/s = 28.996 GiB/s
    farsh-x86-avx2.exe | Hashing 100 GiB (b004ee2e): 1853.249 milliseconds = 57.938 GB/s = 53.959 GiB/s
    farsh-x64.exe | Hashing 100 GiB (b004ee2e): 3960.820 milliseconds = 27.109 GB/s = 25.247 GiB/s
    farsh-x64-avx2.exe | Hashing 100 GiB (b004ee2e): 1891.681 milliseconds = 56.761 GB/s = 52.863 GiB/s

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    hey, 404, i have exactly the same cpu! i would like to see other cpu results but it seems that only haswellers are brave enough to run these executables

  8. #7
    Member PAQer's Avatar
    Join Date
    Jan 2010
    Location
    Russia
    Posts
    22
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    i would like to see other cpu results
    How about old school Conroe?

    Code:
    farsh-x86.exe
    Hashing 100 GiB (b004ee2e): 32914.037 milliseconds = 3.262 GB/s = 3.038 GiB/s
    
    farsh-x86-sse2.exe
    Hashing 100 GiB (b004ee2e): 15302.369 milliseconds = 7.017 GB/s = 6.535 GiB/s
    
    farsh-x64.exe
    Hashing 100 GiB (b004ee2e): 15822.045 milliseconds = 6.786 GB/s = 6.320 GiB/s
    C2D E6550 @2.33 GHz, 6GB DDR2, W7x64

  9. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    PAQer, thanks, you are reading my thoughts! i expected good perfromance on Core2 since it got 128-bit SIMD engines, but your results are discouraging. i looked into the problem and it seems that CPU spends half of the time on unaligned loads. so i prepared executables with all data aligned and, optionally, aligned loads are used. can you please run at least all 4 farsh-x86-sse2.exe programs from the attached archive. even better - run all non-avx executables and give me a console output (i can format it myself )

    these executables should tell a lot for any pre-haswell cpu, since only haswell finally get rid of unaligned penalty. if it will show the difference, i will probably need to provide 2-4 variants of SSE2 code with dynamic selection on the alignment of key+data
    Attached Files Attached Files
    Last edited by Bulat Ziganshin; 30th May 2015 at 01:08.

  10. #9
    Member
    Join Date
    May 2015
    Location
    ~
    Posts
    7
    Thanks
    1
    Thanked 1 Time in 1 Post
    Phenom II 945, 3 GHz, W7x64, 4 GB RAM

    farsh0.1\benchmark>farsh-x86.exe
    Hashing 100 GiB (b004ee2e): 24995.086 milliseconds = 4.296 GB/s = 4.001 GiB/s


    farsh0.1\benchmark>farsh-x64.exe
    Hashing 100 GiB (b004ee2e): 7903.949 milliseconds = 13.585 GB/s = 12.652 GiB/s


    farsh0.1\benchmark>farsh-x86-sse2.exe
    Hashing 100 GiB (b004ee2e): 7244.024 milliseconds = 14.822 GB/s = 13.804 GiB/s


    ---


    farsh-alignment\align-both>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 30476.906 milliseconds = 3.523 GB/s = 3.281 GiB/s


    farsh-alignment\align-both>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 8754.846 milliseconds = 12.265 GB/s = 11.422 GiB/s


    farsh-alignment\align-both>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 7384.217 milliseconds = 14.541 GB/s = 13.542 GiB/s


    ---


    farsh-alignment\align-data>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 30467.067 milliseconds = 3.524 GB/s = 3.282 GiB/s


    farsh-alignment\align-data>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 8760.540 milliseconds = 12.257 GB/s = 11.415 GiB/s


    farsh-alignment\align-data>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 7394.061 milliseconds = 14.522 GB/s = 13.524 GiB/s


    ---


    farsh-alignment\align-key>farsh-x86.exe
    ashing 100 GiB (fe82e924): 30472.935 milliseconds = 3.524 GB/s = 3.282 GiB/s


    farsh-alignment\align-key>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 8758.215 milliseconds = 12.260 GB/s = 11.418 GiB/s


    farsh-alignment\align-key>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 7382.260 milliseconds = 14.545 GB/s = 13.546 GiB/s


    ---


    farsh-alignment\no-align>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 30452.065 milliseconds = 3.526 GB/s = 3.284 GiB/s


    farsh-alignment\no-align>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 8966.305 milliseconds = 11.975 GB/s = 11.153 GiB/s


    farsh-alignment\no-align>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 7545.100 milliseconds = 14.231 GB/s = 13.254 GiB/s

  11. #10
    Member
    Join Date
    Aug 2007
    Location
    Rzeczpospolita
    Posts
    10
    Thanks
    0
    Thanked 2 Times in 1 Post
    Code:
    align-both.farsh-x64-avx2.exe | Hashing 100 GiB (fe82e924):  2037.786 milliseconds = 52.692 GB/s = 49.073 GiB/s
    align-both.farsh-x64.exe      | Hashing 100 GiB (fe82e924):  3565.271 milliseconds = 30.117 GB/s = 28.048 GiB/s
    align-both.farsh-x86-avx2.exe | Hashing 100 GiB (fe82e924):  2227.479 milliseconds = 48.204 GB/s = 44.894 GiB/s
    align-both.farsh-x86-sse2.exe | Hashing 100 GiB (fe82e924):  3522.907 milliseconds = 30.479 GB/s = 28.386 GiB/s
    align-both.farsh-x86.exe      | Hashing 100 GiB (fe82e924): 18804.064 milliseconds =  5.710 GB/s =  5.318 GiB/s
    align-data.farsh-x64-avx2.exe | Hashing 100 GiB (fe82e924):  2040.561 milliseconds = 52.620 GB/s = 49.006 GiB/s
    align-data.farsh-x64.exe      | Hashing 100 GiB (fe82e924):  3577.289 milliseconds = 30.016 GB/s = 27.954 GiB/s
    align-data.farsh-x86-avx2.exe | Hashing 100 GiB (fe82e924):  2245.868 milliseconds = 47.810 GB/s = 44.526 GiB/s
    align-data.farsh-x86-sse2.exe | Hashing 100 GiB (fe82e924):  3559.210 milliseconds = 30.168 GB/s = 28.096 GiB/s
    align-data.farsh-x86.exe      | Hashing 100 GiB (fe82e924): 18721.620 milliseconds =  5.735 GB/s =  5.341 GiB/s
    align-key.farsh-x64-avx2.exe  | Hashing 100 GiB (fe82e924):  2030.425 milliseconds = 52.883 GB/s = 49.251 GiB/s
    align-key.farsh-x64.exe       | Hashing 100 GiB (fe82e924):  3581.340 milliseconds = 29.982 GB/s = 27.923 GiB/s
    align-key.farsh-x86-avx2.exe  | Hashing 100 GiB (fe82e924):  2230.221 milliseconds = 48.145 GB/s = 44.839 GiB/s
    align-key.farsh-x86-sse2.exe  | Hashing 100 GiB (fe82e924):  3510.804 milliseconds = 30.584 GB/s = 28.484 GiB/s
    align-key.farsh-x86.exe       | Hashing 100 GiB (fe82e924): 18792.595 milliseconds =  5.714 GB/s =  5.321 GiB/s
    no-align.farsh-x64-avx2.exe   | Hashing 100 GiB (fe82e924):  2034.917 milliseconds = 52.766 GB/s = 49.142 GiB/s
    no-align.farsh-x64.exe        | Hashing 100 GiB (fe82e924):  3997.399 milliseconds = 26.861 GB/s = 25.016 GiB/s
    no-align.farsh-x86-avx2.exe   | Hashing 100 GiB (fe82e924):  2217.646 milliseconds = 48.418 GB/s = 45.093 GiB/s
    no-align.farsh-x86-sse2.exe   | Hashing 100 GiB (fe82e924):  3588.777 milliseconds = 29.919 GB/s = 27.865 GiB/s
    no-align.farsh-x86.exe        | Hashing 100 GiB (fe82e924): 18802.944 milliseconds =  5.710 GB/s =  5.318 GiB/s

  12. #11
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Intel i7 4960X OC 4.5GHz, 32GB DDR3 2000MHz, Win 8.1 64bit:

    farsh-x64.exe
    Hashing 100 GiB (b004ee2e): 3311.381 milliseconds = 32.426 GB/s = 30.199 GiB/s

    benchmark>farsh-x64-avx2.exe
    Crash

    farsh-x86.exe
    Hashing 100 GiB (b004ee2e): 14734.266 milliseconds = 7.287 GB/s = 6.787 GiB/s

    farsh-x86-sse2.exe
    Hashing 100 GiB (b004ee2e): 3302.627 milliseconds = 32.512 GB/s = 30.279 GiB/s

    farsh-x86-avx2.exe
    Crash

  13. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Intel i7 4960X is an Ivy Bridge and it doesn't support AVX2 so the crash is rather expected although I think printing error message would be a lot better than just plainly crashing :]

  14. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by choochootrain View Post
    Phenom II 945, 3 GHz, W7x64, 4 GB RAM
    thank you, that's interesting - seems that phenom2 doesn't have any problems with misaligned data

  15. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i've fixed Zeroes - it was just a bug in my code. now i have two versions that fails only on Avalanche test:
    - version 1 with report
    - version 2 with report

  16. #15
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Intel Pentium M 1,5 GHz + Win XP SP3 32 bit Czech + 512 MB RAM

    align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 74240.206 milliseconds = 1.446 GB/s = 1.347 GiB/s
    align-both\farsh-x32-avx2.exe Not supported
    align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 40271.890 milliseconds = 2.666 GB/s = 2.483 GiB/s

    align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 73956.563 milliseconds = 1.452 GB/s = 1.352 GiB/s
    align-data\farsh-x32-avx2.exe Not supported
    align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 46726.115 milliseconds = 2.298 GB/s = 2.140 GiB/s

    align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 74018.630 milliseconds = 1.451 GB/s = 1.351 GiB/s
    align-key\farsh-x32-avx2.exe Not supported
    align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 46018.345 milliseconds = 2.333 GB/s = 2.173 GiB/s

    no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 73991.978 milliseconds = 1.451 GB/s = 1.351 GiB/s
    no-align\farsh-x32-avx2.exe Not supported
    no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 51969.406 milliseconds = 2.066 GB/s = 1.924 GiB/s

  17. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    thanks. Pentium M was the last Intel CPU with 64-bit SSE engine, so it shows less speed and less difference between aligned and non-aligned versions. but aligned is still 30% faster

    i feel that on older CPUs xxHash may be faster than any Farsh variants, just because SSE was too slow and pure x86 version is faster in xxHash

  18. #17
    Member PAQer's Avatar
    Join Date
    Jan 2010
    Location
    Russia
    Posts
    22
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    PAQer, thanks, you are reading my thoughts! i expected good perfromance on Core2 since it got 128-bit SIMD engines, but your results are discouraging. i looked into the problem and it seems that CPU spends half of the time on unaligned loads.
    Yes, aligned loads almost 1.6 times faster here compared to unaligned.
    BTW: Conroe has slow shuffle unit (Penryn doubled shuffle perfomance).

    Code:
    T:\farsh-alignment\align-both>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 43673.399 milliseconds = 2.459 GB/s = 2.290 GiB/s
    
    T:\farsh-alignment\align-both>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 10475.474 milliseconds = 10.250 GB/s = 9.546 GiB/s
    
    T:\farsh-alignment\align-both>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 12140.137 milliseconds = 8.845 GB/s = 8.237 GiB/s
    
    
    
    T:\farsh-alignment\align-data>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 43610.016 milliseconds = 2.462 GB/s = 2.293 GiB/s
    
    T:\farsh-alignment\align-data>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 13461.910 milliseconds = 7.976 GB/s = 7.428 GiB/s
    
    T:\farsh-alignment\align-data>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 15137.458 milliseconds = 7.093 GB/s = 6.606 GiB/s
    
    
    
    T:\farsh-alignment\align-key>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 43140.574 milliseconds = 2.489 GB/s = 2.318 GiB/s
    
    T:\farsh-alignment\align-key>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 14095.930 milliseconds = 7.617 GB/s = 7.094 GiB/s
    
    T:\farsh-alignment\align-key>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 15810.221 milliseconds = 6.791 GB/s = 6.325 GiB/s
    
    
    
    T:\farsh-alignment\no-align>farsh-x86.exe
    Hashing 100 GiB (fe82e924): 43529.906 milliseconds = 2.467 GB/s = 2.297 GiB/s
    
    T:\farsh-alignment\no-align>farsh-x86-sse2.exe
    Hashing 100 GiB (fe82e924): 17283.675 milliseconds = 6.212 GB/s = 5.786 GiB/s
    
    T:\farsh-alignment\no-align>farsh-x64.exe
    Hashing 100 GiB (fe82e924): 17802.779 milliseconds = 6.031 GB/s = 5.617 GiB/s

  19. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by PAQer View Post
    BTW: Conroe has slow shuffle unit (Penryn doubled shuffle perfomance).
    UMAC employs PSRLDQ instead of PSHUFD, that is 2x faster on early CPUs. On newer ones, though, it uses the same port as PMUL, so it may limit performance to 2 cycles per one SIMD word. Probably, interleaving them will provide better perfromance for SandyBridge+ CPUs (which can load 2 SIMD words per cycle), while PSRLDQ-only cycle is optimum for older Intel CPUs

    although in the case of Core2, the speed may be limited by the code size - Core2 can't decode more than 16 instruction bytes per cycle and has only 64-byte loop buffer. At least, it may be the reason why x64 version is slower than x86-sse2 one. And it may be the reason why aligned version is faster - it should have smaller code
    Last edited by Bulat Ziganshin; 30th May 2015 at 18:41.

  20. #19
    Member zody's Avatar
    Join Date
    Aug 2009
    Location
    Germany
    Posts
    90
    Thanks
    0
    Thanked 1 Time in 1 Post
    Impressive work !

    I've run all test on my i7-4700MQ (running @ ~3.2 GHz).

    Order is
    farsh-x64
    farsh-x64-avx2
    farsh-x86
    farsh-x86-avx2
    farsh-x86-sse2
    for all tests.
    Code:
    align-both
    Hashing 100 GiB (fe82e924): 4081.477 milliseconds = 26.308 GB/s = 24.501 GiB/s
    Hashing 100 GiB (fe82e924): 2304.709 milliseconds = 46.589 GB/s = 43.389 GiB/s
    Hashing 100 GiB (fe82e924): 21166.664 milliseconds = 5.073 GB/s = 4.724 GiB/s
    Hashing 100 GiB (fe82e924): 2501.702 milliseconds = 42.920 GB/s = 39.973 GiB/s
    Hashing 100 GiB (fe82e924): 3968.424 milliseconds = 27.057 GB/s = 25.199 GiB/s
    
    align-data
    Hashing 100 GiB (fe82e924): 4070.672 milliseconds = 26.378 GB/s = 24.566 GiB/s
    Hashing 100 GiB (fe82e924): 2298.015 milliseconds = 46.725 GB/s = 43.516 GiB/s
    Hashing 100 GiB (fe82e924): 21197.086 milliseconds = 5.066 GB/s = 4.718 GiB/s
    Hashing 100 GiB (fe82e924): 2515.694 milliseconds = 42.682 GB/s = 39.750 GiB/s
    Hashing 100 GiB (fe82e924): 3994.229 milliseconds = 26.882 GB/s = 25.036 GiB/s
    
    align-key
    Hashing 100 GiB (fe82e924): 4070.239 milliseconds = 26.380 GB/s = 24.569 GiB/s
    Hashing 100 GiB (fe82e924): 2300.552 milliseconds = 46.673 GB/s = 43.468 GiB/s
    Hashing 100 GiB (fe82e924): 21193.883 milliseconds = 5.066 GB/s = 4.718 GiB/s
    Hashing 100 GiB (fe82e924): 2499.901 milliseconds = 42.951 GB/s = 40.002 GiB/s
    Hashing 100 GiB (fe82e924): 3969.799 milliseconds = 27.048 GB/s = 25.190 GiB/s
    
    no-align
    Hashing 100 GiB (fe82e924): 4563.363 milliseconds = 23.530 GB/s = 21.914 GiB/s
    Hashing 100 GiB (fe82e924): 2315.089 milliseconds = 46.380 GB/s = 43.195 GiB/s
    Hashing 100 GiB (fe82e924): 21247.594 milliseconds = 5.053 GB/s = 4.706 GiB/s
    Hashing 100 GiB (fe82e924): 2503.921 milliseconds = 42.882 GB/s = 39.937 GiB/s
    Hashing 100 GiB (fe82e924): 4055.867 milliseconds = 26.474 GB/s = 24.656 GiB/s
    I could run tests on a i7-4500U ULV processor as well. Sadly, I don't have a non-Haswell CPU anymore..
    Last edited by zody; 30th May 2015 at 19:58.

  21. #20
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Intel Core2 Duo CPU T7100 1,8 GHz + Win XP SP3 32 bit Czech + 1024 MB RAM

    C:\Data\x>.\align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 48604.204 milliseconds = 2.209 GB/s = 2.057 GiB/s
    C:\Data\x>Rem .\align-both\farsh-x32-avx2.exe Not supported
    C:\Data\x>.\align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 11917.074 milliseconds = 9.010 GB/s = 8.391 GiB/s
    C:\Data\x>.\align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 48586.506 milliseconds = 2.210 GB/s = 2.058 GiB/s
    C:\Data\x>Rem .\align-data\farsh-x32-avx2.exe Not supported
    C:\Data\x>.\align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 15099.395 milliseconds = 7.111 GB/s = 6.623 GiB/s
    C:\Data\x>.\align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 48682.345 milliseconds = 2.206 GB/s = 2.054 GiB/s
    C:\Data\x>Rem .\align-key\farsh-x32-avx2.exe Not supported
    C:\Data\x>.\align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 15657.168 milliseconds = 6.858 GB/s = 6.387 GiB/s
    C:\Data\x>.\no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 48682.790 milliseconds = 2.206 GB/s = 2.054 GiB/s
    C:\Data\x>Rem .\no-align\farsh-x32-avx2.exe Not supported
    C:\Data\x>.\no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 19191.652 milliseconds = 5.595 GB/s = 5.211 GiB/s

  22. #21
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    1. i've found a way to make keys always aligned at 16 byte boundary - FARSH_KEYS by itself is aligned and each next (32-bit) hash will use key material at +16 offset compared to previous one. this means that overall key size for 4*n bytes of hash will become 1024+(n-1)*16 bytes instead of 1024+(n-1)*4 in the first version. in particular, 1024-bit hash result will require 1520 bytes of key material instead of 1148 bytes

    2. and algnment of data being hashed will be checked in farsh_fast(), so the price will be one always-taken branch - very low for any CPU with branch-prediction capabilities

    3. now i translate UHASH higher-level hashes (i.e bit mixers) into my code. so the whole FARSH becomes UHASH with prebuilt key material. The UHASH mixers are ~100 lines of code, so it may be not much price for guaranteed quality

    UPDATE: i copied UHASH bit mixing algorithm for blocks up to 1024 bytes: code & report. But either my copy is wrong or it doesn't make SMHasher happy. Now i need to add vanilla UHASH to my SMHasher tests to find on what side is problem
    Last edited by Bulat Ziganshin; 31st May 2015 at 15:00.

  23. #22
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    189
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Intel Core i3-3220 CPU 3,3 GHz + Win 7 32 bit Czech + 4 GB RAM
    .\align-both\farsh-x32.exe Hashing 100 GiB (fe82e924): 25963.731 milliseconds = 4.136 GB/s = 3.852 GiB/s
    .\align-both\farsh-x32-avx2.exe Not supported
    .\align-both\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4696.721 milliseconds = 22.862 GB/s = 21.291 GiB/s
    .\align-data\farsh-x32.exe Hashing 100 GiB (fe82e924): 25863.286 milliseconds = 4.152 GB/s = 3.866 GiB/s
    .\align-data\farsh-x32-avx2.exe Not supported
    .\align-data\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4941.686 milliseconds = 21.728 GB/s = 20.236 GiB/s
    .\align-key\farsh-x32.exe Hashing 100 GiB (fe82e924): 25962.321 milliseconds = 4.136 GB/s = 3.852 GiB/s
    .\align-key\farsh-x32-avx2.exe Not supported
    .\align-key\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4883.320 milliseconds = 21.988 GB/s = 20.478 GiB/s
    .\no-align\farsh-x32.exe Hashing 100 GiB (fe82e924): 25708.534 milliseconds = 4.177 GB/s = 3.890 GiB/s
    .\no-align\farsh-x32-avx2.exe Not supported
    .\no-align\farsh-x32-sse2.exe Hashing 100 GiB (fe82e924): 4999.023 milliseconds = 21.479 GB/s = 20.004 GiB/s

  24. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    meanwhile, i've found interesting suite of PRNGs. among other things, their xorshift128+ may be used to initialize the FARSH_KEYS[]

  25. #24
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    i added uhash, vhash and poly-1305 algos to my SMHasher suite. of those, only VHASH succesfully passed all tests

    i also extended information printed by the AvalancheTest and made conclusion that block hashes in the Farsh return enough entropy, only their mixing need to be improved. so i tried mixing from xxHash64 which, finally, made all SMHasher tests happy. also i added 64-bit 'seed' in the same way as it is implemented in xxHash64, in order to satisfy both SMHasher and most demanding users

    the mixing algo kidnapped from xxHash64 made computations up to 40% slower, so i will look further into balancing speed&quality. but at least Farsh idea works, now passing all SMHasher tests while being faster than any other hashes. although numeric quality measures made by SMHasher tests are slightly worse than those for murmur3a and probably xxHash too
    Last edited by Bulat Ziganshin; 7th June 2015 at 09:57.

  26. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (8th June 2015)

  27. #25
    Member
    Join Date
    Apr 2010
    Location
    CZ
    Posts
    81
    Thanks
    5
    Thanked 7 Times in 5 Posts
    Timings from my old netbook, it may seem a bit derogatory, but I am surprised it got so fast there!

    (Netbook Aspire One Intel(R) Atom(TM) CPU N2600 @ 1.60GHz, 2 cores Ubuntu 64-bit )
    Code:
    ~/farsh/farsh-alignment/no-align $ wine farsh-x64.exe
     Hashing 100 GiB (fe82e924): 99102.602 milliseconds = 1.083 GB/s = 1.009 GiB/s
    ~/farsh/farsh-alignment/no-align $ wine farsh-x86.exe 
    Hashing 100 GiB (fe82e924): 132066.573 milliseconds = 0.813 GB/s = 0.757 GiB/s
    ~/farsh/farsh-alignment/no-align $ wine farsh-x86-sse2.exe 
    Hashing 100 GiB (fe82e924): 87109.251 milliseconds = 1.233 GB/s = 1.148 GiB/s
    
    ~/farsh/farsh-alignment/align-key $ wine farsh-x64.exe 
    Hashing 100 GiB (fe82e924): 65907.654 milliseconds = 1.629 GB/s = 1.517 GiB/s
    ~/farsh/farsh-alignment/align-key $ wine farsh-x86.exe 
    Hashing 100 GiB (fe82e924): 131712.031 milliseconds = 0.815 GB/s = 0.759 GiB/s
    ~/farsh/farsh-alignment/align-key $ wine farsh-x86-sse2.exe 
    Hashing 100 GiB (fe82e924): 61781.707 milliseconds = 1.738 GB/s = 1.619 GiB/s
    
    ~/farsh/farsh-alignment/align-data $ wine farsh-x64.exe 
    Hashing 100 GiB (fe82e924): 65907.564 milliseconds = 1.629 GB/s = 1.517 GiB/s
    ~/farsh/farsh-alignment/align-data $ wine farsh-x86.exe 
    Hashing 100 GiB (fe82e924): 131792.882 milliseconds = 0.815 GB/s = 0.759 GiB/s
    ~/farsh/farsh-alignment/align-data $ wine farsh-x86-sse2.exe 
    Hashing 100 GiB (fe82e924): 61750.769 milliseconds = 1.739 GB/s = 1.619 GiB/s
    
    ~/farsh/farsh-alignment/align-both $ wine farsh-x64.exe 
    Hashing 100 GiB (fe82e924): 43861.363 milliseconds = 2.448 GB/s = 2.280 GiB/s
    ~/farsh/farsh-alignment/align-both $ wine farsh-x86.exe 
    Hashing 100 GiB (fe82e924): 131727.429 milliseconds = 0.815 GB/s = 0.759 GiB/s
    ~/farsh/farsh-alignment/align-both $ wine farsh-x86-sse2.exe 
    Hashing 100 GiB (fe82e924): 46215.390 milliseconds = 2.323 GB/s = 2.164 GiB/s

  28. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it has speeds close to Pentium M. effect of alignment is up to 2.3x speedup that is absolute record. thanks!

  29. #27
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Bulat, I have found minor issues in the docs:
    1. https://github.com/Bulat-Ziganshin/FARSH#the-main-loop
    The link to the source code is outdated
    2.
    OLD NOTE: FARSH isn't yet ready for practical use since SMHasher shows a lot of problems in the current implementation. But FARSH main loop implements universal hashing scheme that's mathematically proven to guarantee ideal hashing (as far as key material is random) and it employs the formula successfully used in cryptographic UMAC/VMAC algorithms.
    This is not just old, but outdated. I think it's best to remove it, otherwise please be clear that it doesn't apply anymore.
    The following paragraph (maybe two) is outdated too and it's not indicated at all.
    3. Is the bitstream stable already? I don't see it mentioned and it's important for the person I'm trying to convert to FARSH.

    Also, it would be good to compare the algorithm with its competitors.

  30. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Well, it was started as experiment - i want to check whether it will be possible to use UHash construction to provide non-crypto hash that is both very fast and has a high quality.

    FARSH 0.1 reached the high-speed goal, but not passed some SMHasher tests. So i added an extra bit-mixing code from xxHash and produced FARSH 0.2. It passes all SMHasher tests and only ~10% slower than 0.1 version. So the code for 0.2 is finished, but i should finish updating the docs in order to release it.

    Overall, this project wasn't finished, so i hardly can talk about bitstream stability. Even if i will update the docs and release 0.2, i still doesn't have answer to the question - is it really as high-quality as for example xxHash? It's possible that by adding extra mixing in 0.2 i just have hidden dust under a carpet.

    I had a plan to check the same extra-mixing approach on other lower-quality hashes like CRC - if their SMHasher rank also can be improved this way, it would be a strong signal (against quality of FARSH 0.2 and usability of SMHasher overall). But i never implemented that plan, too.

    Anyway, little may be done to improve this construction quality. Regarding speed, it may be improved by 10-20%, but no more. Overall, you can employ the current code as hash of unknown quality, but definitely better than CRC32 and faster than CRC32 in most configs. xxHash64 is a faster way to produce 64-bit hash result on 64-bit pre-AVX2 CPU, in all remaining situations FARSH should be faster. Bitstream most probably will be changed in future versions.

  31. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Analyzis of strong and weak points of FARSH and xxHash led me to the hash construction that should be best of both worlds - xxHash quality at FARSH speed.

    That is the FARSH weak point? Each 4-byte word inside 4 KB block is processed independent of other words and results are just summed up. So the data from different words are mixed up in the rather simple and predictable way. This potentially simplifies bit flips that may remain undetected by the hash. xxHash looks stronger here - each input word is mixed up into 128-bit intermediate value whoise bits are continued to mix up on the every computation step. I feel more confident with such scheme since it better differentiates influence of different input words on the final hash value.

    Also, FARSH intermediate value is only 64-bit, and i feel that for 32-bit final value 128-bit intermediate value would be preferable.

    xxHash weak point is that you need to finish processing of current 128 bits prior to going to the next chunk. It's strongly optimized for x86 cpus, but doesn't leave room for further SIMD optimizations (aside of calculation miltiple hashes simultaneously). FARSH scheme is ideal here - you can run everuthing parallel and overall speed is limited only by serial (and rather slow) process of combining 8-byte intermediate values.



    So, the ideal hash should split input data into fixed size blocks (say 4 KB) and process each block independently, returning 128 bits of intermediate hash. On the next level, those 128-bit values from each block should be combined with secondary hashing to produce the final result. Secondary hashing process may be stronger (up to cryptohashes), since it needs to process much less data.

    The overall hash speed will be limited to the speed of secondary hashing (let's conservatively say 1 GB/s) multiplied by the primary hash reduction coeffecient (4KB/128bit=256), i.e. 256 GB/s even for this conservative estimation. And by adding one more level of secondary hashing, we may change the limit to 256*256 GB/s = 64 TB/s. Of course, it's just the upper limit for infinite computation resources, but f.e. GPUs will reach this limit in the next 5 years.

    Overall, the idea of multi-level hashing is well known in cryptography as tree hahsing, and you can find in http://eprint.iacr.org/2009/210.pdf how to build cryptohash this way. While we don't have such high goals, obeying these rules may be useful to improve our hash quality.



    While 128-bit blocks are ideal fit for SSE registers, they are sub-optimal for avx2/avx-512 and especially GPUs (nvidia/amd has 1024/2048-bit words while Intel GPUs support multiple widths but 512 bits is optimum). I think that 512-bit blocks would be the best choice, summing up all platforms.

    Inside 512-bit chunk, we should have four 128-bit streams, processed independent on each other. This means that older platforms like i386 will process only one stream simultaneously, and then switch to the next stream. SSE2-capable cpus will process all 4 streams simultaneously - it will provide enough ILP to hide dealys. AVX2 cpus will process 2 streams per AVX register, 8 streams from 2 blocks simultaneously in order to hide delays. And AVX-512 devices will process 4 blocks simultaneously. So any device will get enough ILP to hide operation delays and fill all its computation resources.

    nvidia/amd GPUs will need to process 2/4 blocks in the single register but their LD engines should easily handle such non-coalescing access (or extra warp shuffle operation). Overall, they can reach multi-terabyte/s speed, much faster than their on-board DRAM speeds.



    We haven't discussed yet the internal hashing loop, but it can be any function squeezing two 128-bit values into one: hash128=update(hash128,data128 ). One possibility is the xxHash code. Another one is slightly modified FARSH code:

    Code:
    uint32 data[4]
    uint32 hash[4] === uint64 hash64[2] === uint128 hash128
    
    hash64[0] = (data[0]+hash[0]) * (data[1]+hash[1])
    hash64[1] = (data[2]+hash[2]) * (data[3]+hash[3])
    ROTATE hash128,1   // rotate the entire 128-bit register by 1 byte
    This algorithm mixes the input data into all 128 hash bits, but it's still simpler than xxHash inner cycle. It's just 4 SIMD operations per 16 input bytes:

    PADD xmm0, data
    PSHUFD xmm1,xmm0,xxx
    PMULUD xmm0,xmm1
    PALIGNR xmm0,xmm0,1

    so we may expect the same speed as with FARSH on CPUs with SSSE3. Since AVX operations process each 128-bit lane of AVX register independently, the same sequence will work with AVX2 too, doubling the speed compared to SSE2. AVX-512 will probably require more commands to rotate each 128-bit lane independently. CPUs with SSE2 but no SSSE3 (Pentium4,PenitumM,Core1,K8..K10) require 3 operations - PSLLDQ+PSRLDQ+POR. Pre-SSE2 cpus will need 4 SHRD operations.


    And finally, like FARSH we can use multiple initial hash128 values to produce multiple independent 32-bit hashes (and combine them into 64+ bit hashes), and XOR initvals with user-supplied SEED value in order to provide control to the user.
    Last edited by Bulat Ziganshin; 28th June 2016 at 13:53.

  32. The Following 3 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (26th June 2016),Mike (26th June 2016),SolidComp (28th June 2016)

  33. #30
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    I haven't done QA for FARSH, but it's core looks more trustworthy than the permutation+mul+rol of xxHash. Which I definitely don't consider to be a good quality.

Page 1 of 3 123 LastLast

Similar Threads

  1. Hashing
    By Bulat Ziganshin in forum Data Compression
    Replies: 1
    Last Post: 19th May 2015, 08:00
  2. hashing LZ
    By willvarfar in forum Data Compression
    Replies: 13
    Last Post: 24th August 2010, 20:29
  3. What type of hashing is this.
    By Earl Colby Pottinger in forum Data Compression
    Replies: 11
    Last Post: 22nd June 2010, 05:23
  4. Knuth's Multiplicative Hashing
    By encode in forum Data Compression
    Replies: 5
    Last Post: 28th May 2008, 22:18
  5. A nice article about hashing
    By encode in forum Forum Archive
    Replies: 19
    Last Post: 26th September 2007, 21:31

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •