OMG, I just spent an entire day optimizing - or trying - the MD5 algorithm for C#. Especially since the one shipped with mono right now is slooooooow when all optimizations are enabled (yeah, go figure. ;) ). Now I am frustrated, because optimizing is such a hassle, especially when it involves JIT... Well, the code is now quite a bit faster than the default implementation MD5CryptoServiceProvider. (About 40%) But here is one of the many things that are a bit weird. The MD5-Algo has a step that looks like

C#:
  1. a += ((b & c) | (~b & d)) + _decodeBuf0 + (int)0xd76aa478;

which can be translated into

C#:
  1. a += (d ^ (b & (c ^ d))) + _decodeBuf0 + (int)0xd76aa478;

This actually does increase the speed of the Transform step. But in Microsoft's .NET, using the SAME optimization 10 lines further down is decreasing performance! Why, oh, why? ;) Well, at least mono is well behaved. On the other hand, mono is actually quite surprising, too: It performs best, when there is no Unsafe code involved! Even the highest optimized version, using pointers and all kinds of tricks in unsafe mode is about 5% slower than the safe version!
But here is the problem: Now I have like 10 different types of optimizations. Which are all optimal for a specific framework, platform and/or hardware. (yeah, I tested on several different machines) Btw.: Arrays are extremely slow in Microsoft's .NET, but quite fast in Mono... So, what's the solution to my problem? What we need is a library that has like a calibration routine that will choose the right code. ;)

I'll probably post the code of the optimized code, the framework to choose the right optimization and the Testprograms in a couple of days.

Conclusion:
Once again it became clear that there is no such thing as THE optimization.

For the purists (and my source for the implementation - minus the Appendix):
The MD5 RFC 1321