Originally published on my ScapeCode blog.
So previously we delved into one of the nastier performance corners on the .Net framework. Today I'm going to introduce you to a tool, that is in development currently, which allows you to take those slow math functions of yours and replace them with high performance SSE optimized methods.
So what does SlimGen do? Well, you pass it a .Net assembly and it replaces the native method bodies, which are generated using NGEN, with replacement ones written in assembly (for now). This modified assembly then replaces the original assembly that was stored in the native image store. SlimGen can operate on signed and unsigned assemblies alike, as the native image is not signed, more on this later though.
Managed PE files contain a great deal of metadata stored in tables. You can enumerate these tables and parse them yourself, for instance if you were writing your own CLR. Thankfully though, the .Net framework comes with several COM interfaces that are very helpful in accessing these tables without having to manually parse them out of the PE file, this is especially useful since the table rows are are not a fixed format. Specifically, indexes in the tables can be either a 2 bytes or 4 bytes in size depending on the size of the dataset indexed. In the case of SlimGen we use the IMetaDataImport2 interface for accessing the metadata.
Of course, the managed metadata does not contain all of the information we need. NGEN manipulates the managed assembly and introduces pre-jitted versions of the functions contained within the assembly. However, their managed counterparts remain in the assembly and are what the metadata tables reference to. So how does one go from a managed method and its IL to the associated unmanaged code? Well, the CLR header of a PE file does contain a pointer to a table for a native header. However the exact format of that table is undocumented and as such it makes it hard to parse it and find the information we need. Therefore we have to use an alternative method...
When you load up an assembly the CLR generates, using the metadata and other information found in the PE file, a set of runtime tables that it uses to indicate information about where things are in memory, and their current state. For instance, it can tell if its jitted a method or not. When you load up an assembly that's been NGENed, it checks the native images for an associated copy, assuming your assembly validates, and will load up the NGENed assembly and parse out the appropriate information from that. Therefore we need some way of gaining access to these runtime generated tables. Enter the debugger.
The .Net framework exposes debugging interfaces that are quite trivial to implement, but more important, they give you access to all of the runtime information available to the CLR. In the case of SlimGen what we do is load up your assembly (not run) into a host process and then simply have the host process execute a debugger breakpoint. The SlimGen analyzer first initializes its self as a debugger and then executes the host process as the attached debugger. When the breakpoint is hit, it breaks into the analyzer, which can then begin the work of processing the loaded assemblies. Since SlimGen knows which assembly it fed to the host, it is able to filter out all of the other assemblies that have been loaded and focus in on the one we care about. First we check and see if a native version of the assembly has been loaded, for if one hasn't been loaded there is no point in continuing. if not then we simply report an error and cleanup. Assuming there is a native version of the assembly loaded then we use the aforementioned metadata interfaces to walk the assembly and find all of the methods that have been marked for replacement. Each method is examined to ensure that it has a native counterpart, and if it doesn't another warning is issued and the method is skipped.
Now comes the annoying part. In .Net 1.x the framework had each method exist within a singular code chunk, which made extracting that code quite easy. However in .Net 2.x and forward the framework allows a method to have multiple code chunks, each with a different base address and length. This is theoretically to allow an optimizer to spread work its magic, but it does make extracting methods harder. SlimGen will generate an assembly file per chunk and all of the associated binaries for each chunk, generated from the assembly files, must be present for the method to be replaced. No dangling chunks please. The SlimGen analyzer extracts each base address from each chunk, along with the module base address. Using that information we can then calculate the relative virtual address of each method's native counterpart within the NGENed file.
Using that information the SlimGen client simply walks a copy of the native image performing the replacement of each method, and then when done (and assuming no errors), copies it back over the original NGEN image. Tada, you now have your highly optimized SSE code running in a managed application with no managed -> unmanaged transitions in sight.