Washu and I originally started a separate SlimMath project back in March, when the new XNAMath library was release that made heavy use of SIMD for blazing fast performance. We wanted to bring these performance gains to SlimDX, but it was obvious that the normal method of wrapping wouldn't suffice for these small and quick bits of code; the managed/native barrier was too great to cross. I came up with the idea to batch math calls and send them all across the marshaling barrier at once, therefore only paying the cost once in each direction. We started work on that, but even with this case, the costs paid were so large that you had to do a monstrous amount of work for it to be worthwhile.
We abandoned this idea, and SlimMath was left to die. Just a few weeks ago though, Washu came up with the idea of modifying the .NET assembly after it had been compiled with NGEN and injecting our own implementations of certain functions that used hand-written ASM. The idea immediately appealed to me, and we set off on such a project, dubbing it SlimGen. While Washu is no doubt preparing a series of blog posts on the subject (he's quite excited about this, something that should amaze and scare you at the same time), I just wanted to talk a little bit about how this could be used.
The way we see it, our SlimGen client program runs on the end user's computer, at install time. After the .NET target assembly (for example, SlimDX.dll) has been NGEN-ed and installed into the GAC, we run our program that takes compiled object files and inserts the raw code into the native assembly image. This has a few major benefits for us. First, you can avoid all interop costs by injecting the code directly. Second, you can make use of any instructions you wish in the raw code, which means that math methods can now take advantage of SIMD extensions. The third benefit, which is mostly derived from running the program at install time, is that it can customize the assembly based upon the extensions supported by the target processor. This means that if a chip is detected as having SSE4 support, we can inject our SSE4 version of a function directly, without any runtime checks.
This scales well too, since for functions where we don't care to provide a particular SSE4 version, we can simply downgrade to the next available version, or if we don't provide any optimized ASM for a method, simple go on using the one provided in the assembly. This new tool, once finished and properly tested, should allow .NET library and application developers to squeeze every possible performance benefit out of using native code, while still retaining many of the niceties and usability benefits of managed development.