On the hunt for safe optimizations for md5

by Tobias Hertkorn on June 10th, 2006

Well, right now I am testing some other optimizations that use strictly safe code. One of the things I came across:

On my fast machine it helps to do as much arithmetic as possible in one go. For example:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = (b <<22) | (b>> 10);
  3. b += c;

This b += c is not smart to use in this situation and actually produces a performance hit. Better would be:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = ((b <<22) | (b>> 10)) + c;

Altering the whole segment like this in the md5 algorithm adds up to a 10-15% performance increase!

Let's look at the IL:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27. stloc.1  // stores b
  28. ldloc.1  // loads b
  29. ldloc.2
  30. add
  31. stloc.1

the second version's:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5AddOptimization3::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5AddOptimization3::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27.                // store, load eliminated
  28. ldloc.2
  29. add
  30. stloc.1

I thought this was weird, because I assumed that the compiler might be able to find these kinds of unnecessary operations, but I guess it is harder to locate them than it seems. Especially since a wrong optimization might end up with broken code.

A weird thing is: I just did some testing on my old, old i586 - and this optimization I described here actually decreases performances under some cirumstances! Obviously the lesson learned is: Never trust "optimizations" without testing them!

Well, I'll soon write about more safe optimizations I found. But this is for sure again: Optimizing is all about trial and error. There is almost no way to tell if anything might perform better or worse.

June 10th, 2006 8:07 pm | Comments (2)

Md5 – no unsafe in crypto

by Tobias Hertkorn on June 9th, 2006

I got in contact with Sebastien Pouliot in order to contribute back to mono. And I got a really nice email back. He suggested a couple of other optimizations, explained some of the performance gains I was experiencing and was looking forward to seeing more of my optimizations. But unfortunatelly he also wrote this:

For many reasons I don't want to add unsafe code into crypto. Beside being "unsafe" many projects use copies of Mono crypto source code directly into their apps and I don't want to require them to use unsafe code (and limit the condition were the code can be executed, e.g. FullTrust)

That makes it impossible to use the one optimization that gives the biggest performance boost on little endian machines. Since that optimization will not make it into mono, I just wanted to blog about it here - just in case somebody needs the extra edge.

Optimizing the md5 algo on little endian machines

This decoding step is a major part within the md5 algo. It translates a 64 byte array into a 16 uint array. The general endian-safe step looks like this:

C#:
  1. for (i = 0; i <16; i++)
  2. {
  3.   buff[i] = (uint)(inputBuffer[inputOffset + 4 * i])
  4.     | (((uint)(inputBuffer[inputOffset + 4 * i + 1])) <<8 )
  5.     | (((uint)(inputBuffer[inputOffset + 4 * i + 2])) <<16)
  6.     | (((uint)(inputBuffer[inputOffset + 4 * i + 3])) <<24);
  7. }

If this step is performed on a little endian machine, the bytes are already safed in memory in the right order and there is no need for all the bitlogic. Therefore optimizing it using unsafe code and direct copying 4 consecutive bytes as one uint gives a huge speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     byte* inputPointer = bFixed + inputOffset;
  6.  
  7.     for (i = 0; i <16; i++)
  8.     {
  9.       buff[i] = *(uint*)(inputPointer);
  10.       inputPointer += 4;
  11.     }
  12.   }
  13. }

As we are already in unsafe context using an additional pointer for buff gives an additional speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     fixed (uint* aFixed = buff)
  6.     {
  7.       byte* inputPointer = bFixed + inputOffset;
  8.       uint* bufferPointer = aFixed;
  9.  
  10.       for (i = 0; i <16; i++)
  11.       {
  12.         *bufferPointer = *(uint*)(inputPointer);
  13.         inputPointer += 4;
  14.         bufferPointer++;
  15.       }
  16.     }
  17.   }
  18. }

Be sure to still make this Md5 implementation endian safe by introducing a if:

C#:
  1. if (BitConverter.IsLittleEndian)
  2. {
  3.   // unsafe decode
  4. }
  5. else
  6. {
  7.   // bitshifting decode
  8. }

I really can understand why this is important. There should not be any unsafe code in a crypto environment, and I agree on that. Unfortunatelly I have not come up with an alternative solution for that particular decode step. Using

C#:
  1. BitConverter.ToUInt32(inputBuffer, inputOffset + 4 * i);

really is no good. Unfortunatelly this method does some extensive testing before converting the bytes. That slows it down so much, that it performs worse than the bitops.

June 9th, 2006 2:14 pm | Comments (0)

Optimized MD5CryptoServiceProvider for Mono

by Tobias Hertkorn on June 8th, 2006

Inspired by a comment here, I implemented a couple of my optimization strategies as MD5CryptoServiceProvider. Tests are running as I speak and should be done tomorrow morning, but these figures sure do give some hope:

MD5CopyOptimize (5): 0,446876320604479
MD5CopyOptimize2 (8): 0,447343112157013
MD5Bitlogic (4): 0,448018751258771
MD5Bitlogic2 (7): 0,448770640646336
MD5Inline (3): 0,457765987475873
MD5Inline2 (6): 0,458054073151199
MD5LittleEndian (2): 0,477792175150227
MD5NoArray (1): 0,526980798829037
MD5CryptoServiceProviderMonoOrig (0): 1,45861854618167

That's on my WinXP machine, running Microsoft's .NET 2.0. I do not have any data for Mono yet, but it seems to be about the same - that means, about 3 (!) times faster than the original mono Md5.

To understand which optimizations are done, please download the optimized MD5CryptoServiceProvider: Md5MonoOptimized
UPDATE: 4 and 5 (MD5Bitlogic and MD5CopyOptimize) did have a typo in them and therefore produced wrong results. This problem is fixed as of 15:00 on the 9th.

I'll post the mono test results tomorrow morning.

June 8th, 2006 11:14 pm | Comments (1)

Java and Sun in the news today

by Tobias Hertkorn on May 16th, 2006

I am still a subscriber of the Debian Java mailing list. And on Tuesday a little, innocent email came over the mailing list:
Tom Marble - Towards Java Libre.

After that there was a bit of chit-chat, until reply email from Arnaud Vandyck. After that the thread is really taking off, inspiring very interesting posts about Open Source ethics, Sun's need to adapt, etc. I encourage everyone to follow this thread!

On the same note:
Taken from two articles at heise (in German):
Wird Java doch Open Source?
Sun stellt Java Enterprise Edition 5 fertig

Sun thinking about releasing Java as Open Source
In a recent keynote Sun-CEO, Jonathan Schwartz, did publicly discuss the possibility that Sun might release Java as open source. It would be quite an agressive move to ensure predominance in the Linux world, since releasing Java as open source would create the possibility that it will be bundled with distributions. Right now for example Debian does not ship Sun's Java through its main distribution mechanisms since it violates their distribution policies.

Sun releases Java Enterprise Edition 5
Java EE 5 should make it easier and faster to develope applications. It heavily uses annotations, which make it easier to handle EJB configuration, since the required parameters can be set within the code.

May 16th, 2006 8:01 pm | Comments (0)

The Bazaar and the Cathedral

by Tobias Hertkorn on May 7th, 2006

Well, I just watched Revolution OS. It is a documentation about Linux done in 2001. To quote form their website:

REVOLUTION OS tells the inside story of the hackers who rebelled against the proprietary software model and Microsoft to create GNU/Linux and the Open Source movement.

I really liked the presentation, being a Linux user, hacker and fan since about 1996. Yep, I just got up, walked over to my bookshelf and there it still was. "LinuX Anwenderhandbuch und Leitfaden für die Systemverwaltung" loosly translatable to "Linux Userhandbook and Guide for System Administration". So it is a beginners book to Linux. And it was clearly written by hackers for hackers. Ah, the good-old-times. ;) Well, anyway, to get back to my topic:
The Bazaar and the Cathedral, an essay by Eric S. Raymond is mentioned in that documentation and I just thought, well that might just sound like the beginning of the agile and/or extreme programming movement, whose philosophy I started using at the webdevelopment team of experten-netzwerk about 2 years ago. And it got me quite excited, because I think it is always fun to have an unfiltered look at the unwatered inspiration that leads up to "movements".

Well, it basically is a compilation of very simple principles, but that's the beauty. I needed it written down. I did use some of the points talked about by instinct.

  • Given enough eyeballs, all bugs are shallow.
  • Constant feedback is good. Instant gratification.
  • Bugs don't drive users away.
  • Be chatty with your mailing lists. Don't just send release notes to the mailing list. Let the mailing list come up with solutions or help you choose between possible solutions. Aquire beta testers. Recognize good ideas from other people.
  • Release early, release often, listen to your customers.

As I said very basic ideas, but even going through this simple list, I do stumble over a couple of points that need or needed improvement in my current or former projects. So that got me really interested and fortunatelly I continued reading. And I would encourrage you to do so as well. It is quite a remarkable read. For those who are not willing to do that, even though I just told you to do so (I do promise you some egoboo! Ha, read on!), here is a short summary

Linus and therefore Linux uses a very different communication pattern than traditional software projects use. The major problem of large projects is scalability, known as the n-squared complexity cost. Linux has a starshapped communication pattern, with basically a lot of programmer talking to one guy. This means that there are a lot of people basically working for one guy (Linus). But how can people get motivated to put a lot of effort, work and spare time into a project?

What motivates hackers?
Basically projects don't start in bazaar mode. The code must have a plausable promise. It must promise that given a certain amount of work, it will become brilliant. Based on this promise the interest in the project must be nurtured the produce a "critical mass" of people that are working for the project, by code, testing or giving feedback. The trick is: People are playing for reputation. You don't take up the title hacker yourself. That's a title other people assign to you.
Raymond goes on and uses some definitions from anthropology: The internet culture is comparable, if not exactly the same as a gift culture: People gain status by giving things away and everybody plays for reputation. Usually that translated into gifts or food. But when it comes to the internet culture it translates to: Give away free time, energy and skill in order to look good in front of other hackers. People play this game for reputation. Knowing this, it is easy to optimize the management strategies of open source project. Make sure you motivate your users, testers and programmers by promising them knowlegde, skills and standing in front of other hackers. Raymond uses the term Egoboo used mainly in the SciFi community. The goal in a free software project is to create and nurture a gift culture. This is the ideal way to operate without force in an anarchy kind situation.

Just a tip: Listen in on the talk by Raymond. Link is on his website.

Well, is the paper a bit optimistic? Maybe. ;) Did it start the Extreme Programming or Agile Programming movement? Maybe not. I did start reading the article to get some new ideas for my managing. Did I get those? Hell yeah! Read between the lines and learn a lot about motivation. Industry projects do lean too much on pressure as far as I am concerned and should look at the OSS to find out about reputation, reward and ... fun. ;)

May 7th, 2006 6:36 pm | Comments (0)
Tobi + C# = T# - Blogged blogoscoop