When looking to assembly code produces by gcc, i was shocked, that it wasn't translated to mmx or sse (compiler switches turned on!), something like:
a[0] += b[0]
a[1] += b[1]
a[2] += b[2]
a[3] += b[3]
The code around is vectorized too, in the same manner. Gcc won't assemble it mmx or something natively? (gcc 4.2)



This even improves compression! Maybe i can put everything together this weekend 