• ### Announcements

• entries
58
218
• views
113937

A journey from here to God knows where, with a few passengers along for the ride.

## New Blog

So I finished setting up my new website / blog and added a bunch of my old posts from here. Check it out and give me some feedback on the styling and what you think of it in general.

I've got a bunch of topics to write about queued up, so check back every once in a while if you're interested.

## Exodus of the Faithful

I went and got myself a blog set up on an externally hosted site: http://www.popoloski.com. Why, might you ask? Read on.

I've found that my usage of GameDev.net has slowly but steadily decreased over the past few months. Certainly the new forum software has a small and indirect part to play in this, but the real reasons run much deeper than superficial UI annoyances. To put it succinctly, the quality of content I used to expect from GDNet has fallen sharply since the precipitous leap to the new software; I place the blame for this almost entirely on the decline of activity by other iconic members who have since moved on to greener pastures.

GDNet will always hold a special place in my heart as the forum where I cut my teeth on programming. I would not be the developer that I am today without it. It's important to make a distinction here though; the forum became the resource that it was almost entirely through its amazing user-base, one that was both deeply knowledgeable and willing to answer questions, as well as willing and able to ask good and insightful questions. Even with an overabundance of brilliant gurus, without those of the latter group asking good questions, the information doesn't get out there for others to absorb. It's my experience that the best learning comes from answers to questions that you might never have even known or thought to ask yourself. Indeed, one need only look at one of the myriad of examples of this principle in action.

You need both halves of this whole in order to wring the most wisdom you can from your collective user base. While the gurus have been slowly moving away, I think the really troubling loss is that of the "intelligent questioner". While I doubt that we have any less of them today than we did years ago, the new site design seems to invite an overwhelming amount of noise and inane questions that trample and drown the good questions before they even get off the ground. That's not to say that the loss of the gurus is any less devastating; I've made my way into the games industry now, and haven't asked direct questions in years, but even so many of the users I looked to for compelling technical content, both in the forums and in the personal journals have gone silent and missing.

Examples? ToohrVyk, Ysaneya, dgreen, and Drew Benton, some of the top rated users on the old site, all have only a handful of posts since the change over. What's even more troubling is the disappearance of moderators. Promit, Ravuya, and Oluseyi are hardly around anymore, mittens and jollyjeffers have dropped off the face of the Earth, and I know that jpetrie, moderator of For Beginners, hasn't even logged in in several months, and is currently close to achieving moderator status on the GameDev StackExchange site.

So yes, the site is losing users and the general quality level of posts has fallen. That, coupled with the decline of the journals and the bizarre and unwelcome direction the staff are taking with the site, are enough to turn me off for good. It's their site and they're welcome to do with it what they will, but it's become clear to me, and from the evidence several others as well, that it's not a direction that good for the site long term.The insistence on political correctness and "play nice" attitude being forced down our throats is particularly puzzling, especially for a community that once prided itself on having a no-nonsense, blunt, and straightforward response to any and all questions.

As I mentioned, my forum usage has already been gradually decreasing itself over the past months, changing from "several times a day" to "once or twice a week" to "meh, whenever I'm bored." I unsubscribed from the mailing list after they started using it to spam advertisements, and my GDNet+ subscription runs out some time in August. That leaves only the journals as a resource I use regularly here on GameDev.net, and now I've got my external site to take care of that as well. So what's going to change? Not much. I'll still be around from time to time to look at any SlimDX questions, and I'll probably cross-link any blog entries here, but in my mind this marks the end of GDNet as my "home" on the internet; I'm leaving and headed for a new promised land.

## Demystifying SSE Move Instructions

[size="4"]Introduction

I've been doing a lot of work with SSE-related instructions lately, and finally got fed up with the myriad of move instructions available to load and store data to and from the XMM registers. The differences between some are so subtle and poorly documented that it can be hard to tell that there is even any difference at all, which makes choosing the right one for the job almost impossible. So I sat down and poured through the Intel instructions references and optimization manuals, as well as several supplemental sources on the internet, in order to build up some notes on the differences. I figured I might as well document them all here for everyone to use.

The name of the game with picking any instruction is performance, and you always want to choose the one that will get the job done in the least time using the least amount of space. Thus the recommendations here are geared towards these two goals. Each instruction has several bits of information associated with it that we must take into account:

• The type of the data it works with, be it integers, single precision floating point, or double precision floating point.
• The size of the data it moves. This can range from 32-bits to 128-bits.
• Whether it deals with unaligned memory or can be used with aligned memory only.
• If the move only affects a portion of a register, what happens to the remaining bits in that register after the instruction finishes.
• Any other special side-effects that the instruction may have.
[size="4"]128-bit Moves

Let's start off with the 128-bit moves. These move an entire XMM register's worth of data at a time, making them conceptually simpler. There are seven instructions in this category:

movapdmovapsmovdqa***movupdmovupsmovdqu***lddqu
All of these instructions move 128-bits worth of data. Breaking it down further, the first three instructions work with aligned data, whereas the next three are the unaligned versions of the first (we'll talk about the last one in a minute, since it's a bit special). The aligned versions offer better performance, but if you haven't ensured that your data is allocated on a 16-byte boundary, you'll have to use one of the unaligned instructions in order to load. When doing register-to-register (reg-reg) moves, it's best to use the aligned versions.

Each of the three instructions in each category (aligned and unaligned) operate on a different data type. Those with a 'd' suffix work on doubles; those with an 's' work on singles, and movdqa works on double quadwords (integers). This is usually a source of confusion for people, myself included, since regardless of the data type, 128-bits are still being moved, and a move shouldn't care about the raw memory it's moving. The differences here are subtle and easily overlooked, and have to do with the way the superscalar execution engine is structured internally in the microarchitecture. There are several "stacks" internally that can execute various instructions on one of several execution units. In order to better split up instructions to increase parallelism, each move instruction annotates the XMM register with an invisible flag to indicate the type of the data it holds. If you use it for something other than its intended type it will still operate as expected; however, many architectures will experience an extra cycle or two of latency due to the bypass delay of forwarding the value to the proper port.

So for the most part, you should try to use the move instruction that corresponds with the operations you are going to use on those registers. However, there is an additional complication. Loads and stores to and from memory execute on a separate port from the integer and floating point units; thus instructions that load from memory into a register or store from a register into memory will experience the same delay regardless of the data type you attach to the move. Thus in this case, movaps, movapd, and movdqa will have the same delay no matter what data you use. Since movaps (and movups) is encoded in binary form with one less byte than the other two, it makes sense to use it for all reg-mem moves, regardless of the data type.

[size="4"]Non-Temporal Moves

In addition to these instructions, there are four extra 128-bit moves that require mentioning:

movntdqa***movntdqmovntpdmovntps
These are the non-temporal loads and stores, so named since they hint to the processor that they are one-off in the current block of code and should not require bringing the associated memory into the cache. Thus, you should only use these when you're sure that you won't be doing more than one read or write into the given cache line. The first instruction, movntdqa, is the only non-temporal load, so it's what you have to use even when loading floating point data. The other three are data-specific stores from an XMM register into memory, one each for integers, doubles, and singles. All of these instructions only operate on aligned addresses; there are no unaligned non-temporal moves

[size="4"]Smaller Moves

Next we come to the moves that operate on 32 and 64-bits of data, which is less than the full size of the XMM registers. Thus this introduces a new wrinkle; namely, what happens to the remaining bits in the register during the move.

movd / movqmovss / movsd***movlps / movlpdmovhps / movhpd
The first instruction in each pair listed above operates on singles (ie. 32 bits of data) and the second works on doubles, which is 64 bits of data. The first set, comprising the first four instructions, generally fill the extra bits in the XMM register with zero. The second set does not; it leaves them as they are. I'll discuss in a moment why this is not necessarily a good thing. movd moves 32 bits between memory and a register. It cannot, however, move between two XMM registers, which is an oddity that the rest of the instructions listed here do not share. movq will always zero extend during any move, including between memory and between registers. movd and movq are meant for integer data.

movss and movsd are meant for floating point data, and only perform zero extension when moving between memory and a register. When used to move between two XMM registers, they do NOT fill the remaining space with zeroes, which is confusing. movlps and movlpd generally perform the same operation, moving 32 and 64 bits of data respectively. They do not, however, perform a zero extension in any case. movhps and movhpd are slightly different from the others in that they move their data to and from the high qword of the XMM register instead of the low qword like the others. They don't do zero extension either.

Since the second set of instructions don't do zero extension, you might think that they would be slightly faster than ones that have to do the extra filling of zeroes. However, these instructions can introduce a false dependence on previous instructions, since the processor doesn't know whether you intended to use the extra data you didn't end up erasing. During out-of-order execution, this can cause stalls in the pipeline while the move instruction waits for any previous instructions that have to write to that register. If you didn't actually need this dependence, you've unnecessarily introduced a slowdown into your application.

[size="4"]Specialty Instructions

[size="2"]There are several other instructions that have special side-effects during the move. Generally these are easier to see the usage, since there is only one for a given operation.

movddup - Moves 64 bits, and then duplicates it into the upper half of the register.

movdq2q - Moves an XMM register into an old legacy MMX register, which requires a transition of the x87 FP stack.
movq2dq - Same as above, except in the opposite direction.

movhlps / movlhps - Moves two 32-bit floats from high-to-low or low-to-high within two XMM registers. The other qwords are unaffected.

movsldup - Moves 2 32-bit floats from the low dwords of two XMM registers into the low dwords of a single destination XMM register, and then duplicates them into the upper dword of each half. Kind of confusing to describe, but the diagram in the documentation makes it easy to visualize if you want to use it.

movmskps / movmskpd - Moves the sign bits from the given floats or doubles into a standard integer register.

maskmovdqu - Selectively moves bytes from a register into a memory location using a specified byte mask. This is a non-temporal instruction and can be quite slow, so avoid using it when another instruction will suffice.

[size="4"]Conclusion

There are a lot of SSE move instructions, as you can see from the above. It annoys me when I don't understand something, and whenever I needed a move I would get bogged down trying to decide which was best. Hopefully these notes will help others make a more informed decision, and shed light on some of the more subtle differences that are hard to find in the documentation.

[size="4"]References
[size="2"]
Besides various forum entries and random webpages found through judicious Googling, I took a lot of information from:
1. [size="2"][size="4"][size="2"]Intel Optimization Manual
2. [size="2"][size="4"][size="2"]Intel Instruction Reference
3. [size="2"][size="4"][size="2"]Agner Fog's Optimization Manual
[size="2"]

## New Job

So it's official now. I'm heading out to Seattle this May to work on Guild Wars 2 at ArenaNet. Should be fun, and a good chance to get my name on a professional game.

That is all.

## Language Builder IDE

A few months back I detailed my work on an HLSL plugin for Visual Studio that would add syntax highlighting and IntelliSense support. I had to take a break from that for a while, but I started up again last week, and I've taken things in a new, more generalized direction.

Rather than work on the HLSL parser itself, I've focused my efforts on a language builder IDE, that provides tools to easily build complete front-ends for languages, including support for all of the features people have come to expect from a language's tools. I'm calling this project SlimLang for now because I'm unimaginative and the "Slim" moniker has a good reputation associated with it.

Work on the incremental parser has been difficult, due both to the complexity as well as lack of information on them out on the net, so one of the tools I really want to add to the IDE is a built-in debugger to allow stepping through the parser as it runs and see the output as a visual graph as it's being constructed. It should really help my implementation of the parser, which I can then provide as a separate component to use in conjunction with the parse tables generated by the IDE.

For the visual graph part, I found the library OpenGraph and Graph#, and for the text editor for specifying the language grammar I found AvalonEdit, both of which are WPF libraries, so I decided to take my project in that direction. I haven't been fond of WPF, but recent changes in WPF4 have definitely improved things for the better. Compare the text rendering quality between the old and new versions below:

You can also see from those images my work on a tab control style that mimics the property page tab found in Visual Studio. I didn't know much about styling when I started, so it was basically a crash course for me. The ends results are pretty good though:

I integrated the text editor to allow for specifying the grammar and wrote a quick manual parser to read it. Also got the error list hooked up nicely. All in all, it was a good few days of work.

Also some work using a DataGrid for editing terminal options:

Hopefully this should make further work on the HLSL plugin much easier, as well as provide a platform for anyone with a personal scripting language to create easy plugins for language support, which can really aid usage and adoption of a new language.

I was thinking about making these two things open source (the HLSL plugin and the SlimLang IDE), but I'm also leaning towards providing them for sale to try to get some actual income. Any thoughts on whether these two tools would be useful enough to anyone to be worth spending the money?

## The Downward Spiral

Reading the community journals these days is a disheartening task of shoveling through the mud and the muck to find rare gems. As the layers of filth get deeper and deeper, the payoff becomes less and less worth it; hence, The Downward Spiral.

Today I ran across the start of a so-called "tutorial series" on software engineering: The Begining [sic]. I tried to post my thoughts there as a comment, but in the infinite wisdom of the new forum software, users are allowed to moderate comments to their own blogs, so I had to make a separate entry. Besides being a prime example of the previously mentioned "mud and muck", it's also the poster child for Josh Petrie's seminal blog post: Don't Write Tutorials. I'll highlight some relevant sections for you in case you don't want to read it:

[quote name="Josh Petrie"]The larger problem with tutorials is that there are just so many of them, and most of them are written by people who at best have only a passing understanding of the subject. Frequently the author lacks even that basic foundation, and just thinks he or she understands the material (see "Unskilled and Unaware of It")..[/quote]
Stonemetal's tutorial is not software engineering. Encouraging people to do error checking, which is a basic programming practice, is a good idea of course, but he's picked one of the lousiest possible examples to explain it with. But disregarding the "moral" of the post, which is ridiculously narrow and yet manages to be so overly broad as to be useless, let's look at the far worse transgression:

[quote name="Josh Petrie"]In short, tutorials should only be written by people who are very experienced -- indeed, experts -- in the problem domain. Even then there are dangers; just because somebody knows the subject well does not mean they have the requisite communications skill to pass that information on in a clear, understandable format.[/quote]
Now, it's obvious that stonemetal isn't an expert here. But more than that, the layout, structure, formatting, mechanics, and grammar are absolutely hideous. He's misspelled several words, including the very name of the entry. Furthermore his explanations are scatterbrained, anecdotal, and filled with vague speculation and nonsense like this:

[quote name="stonemetal"]So now some definitions of technical terms: right: this method has worked for me in this situation in the past, wrong: doing this has blown up in my face before or typically it makes things harder than necessary.[/quote]
Since when are those technical terms, and why didn't anyone else get the memo?

[quote name="stonemetal"]so how would we do this with exceptions why the function std::cout.exceptions gives you access to what will throw an exception and allow you to set what will throw an exception.[/quote]
I can't even begin to comprehend what this... "sentence"... is trying to explain, but it's hardly even English.

Please journal writers, for the love of all that is good and decent, do not write more of this. Perpetuating more of this filth on the unsuspecting internet is not only pointless from your end but damaging to any beginners who read it and think you might know what you're talking about. You should spend your time studying real software engineering and programming practices, and work on your communication skills as well while you're at it.

## Job Interviews

So I recently went for job interviews at both ArenaNet and Microsoft. I'm really excited about both positions, and I think the interviews went well, so we'll see what happens if/when the offers come through.

Going out to Washington was cool. I got to meet up with Josh Petrie for dinner a few nights, and I liked the area from what I was able to see of it. ArenaNet is working on some cool stuff, and it'd be great to finally be able to say that I worked in the games industry.

On the other hand, Microsoft has a pretty impressive campus, and the list of perks and benefits is pretty big. It's hard not to get caught up when you're interviewing for a team of several thousand people, all of whom work at a campus that sees 30,000 people working there every day. I interviewed for the Office Exchange team, which isn't exactly the area I'm most interested in, although it was nice to see that most of Exchange is now written in C#.

If I get an offer from both places, I'm still not sure which way I'm going to go. This is only for a 12-week summer position, as I still have a year of school to finish afterward, so there's a lot to take into consideration when making the final decision.

## HLSL Language Service

I was playing around with prettyprint and extending it to colorize HLSL source for my tutorial web pages, and I got to thinking that Visual Studio should really have a fully-featured extension for HLSL. Unfortunately, there is no such thing. There are two extension that I know of which attempt to do this: NShader and IntelliShade. IntelliShade was a good start a while back, but it's old and no longer being worked on, and isn't even open source so nobody else can pick up the mantle on the project. NShader is an ok colorizer, but that's all it does, and there are spots where it messes up. I want more.

I started looking into writing my language service for Visual Studio 2010. The new extensibility SDK has a ton of stuff on adding features like that, so I decided to dive and start "SlimShade", an HLSL extension for VS 2010.

### Parsing

I knew that if I wanted to add all the features I wanted, I was going to need a full-fledged parser. I knew almost nothing about parsers, so I started looking around and doing some research. I stumbled across Irony, which is a fully managed generic parser that takes a grammar (also written in C#) and parses a given file. The simplicity of the project appealed to me over the larger and more obtuse parser generators like Yacc and ANTLR, so I decided to start messing around with it.

Irony is pretty cool, and the source code is fairly easy to follow, with a lot of sample grammars to look at. I used one as a base and started constructing the HLSL grammar. Unfortunately, HLSL doesn't have an official grammar published anywhere, so I had to "guess and check" a lot, looking at the (sometimes blatantly wrong) documentation and try to figure out all the rules. In the end, the best method I stumbled on was writing a quick utility to run through all of the shader and effect files in my DXSDK and NVIDIA SDK folders and try to parse them. When I got an error, I knew I had something else to fix in the grammar.

HLSL has blown up in complexity in the last few versions, and now has everything from namespaces to classes and interfaces. There are also a lot of little undocumented quirks in the language that I only discovered by running into them in official example source. For example, did you know that the following code is valid in HLSL?

float2 a, b;float4(a, b) = mul(input, matrix);

Normally you can't use a constructor as an l-value like that (and the compiler throws errors in almost every case I tried), but in this case you can, presumably because it's acting tuple-like to update a and b with the appropriate values.

Eventually I managed to piece together a fairly complete grammar. An interesting thing to note here is that Irony is a LALR(1) parser, which means that it only has 1 look-ahead to determine the purpose of a given token. This makes it a fast and compact parser, but also makes it difficult to construct a grammar that has no ambiguities. Luckily, Irony comes with a Grammar Explorer app which will analyze your grammar and show you all the conflicts, and then give you a trace of the parser state to help you walk through and determine how to resolve things.

### Optimization

Irony is great as a stand-alone parser, but I need to be able to run it very quickly, in response to user input inside of Visual Studio. I'm not sure if Irony is even still being maintained anymore, but regardless I started to work on fixing a few issues I had run into and in improving the general performance.

Profiling showed the biggest hot-spots in the scanner portion of the parser, so I started focusing my efforts there. Most of the things I changed weren't huge issues by themselves, but were being called so many times during the process that they were slowing things down. For example, a certain section of code was calling HashSet.Contains a few million times during the parsing process to determine applicable terminals. I changed this to cache all possible sets before hand, and just use them straight from there. More memory, but faster during parse time, which is what I needed.

When I started, it took around 300 ms to fully parse a 15000 token file. Taking out a bunch of Irony features I wasn't using, I was able to knock that down to 250 ms. Switching to .NET 4 got me another 50 ms for free, which was a nice surprise. Since I do so many dictionary and hashset lookups, I went around and cached all the hash codes for my objects, which resulted in another couple dozen ms shaved off. Caching possible sets before hand, as I mentioned earlier, reduced it by another 40 ms.

At around 140 ms to parse now, I started to scrape the bottom of the barrel on things I could do. I inlined many frequently used properties and converted automatic properties into fields instead, which saved me around 10 - 20 ms. At this point garbage collections are showing up in the profiler as the bottlenecks, so I went around and reduced a bunch of allocations and changed a few objects into value types, which knocked another 30 ms off the time. Eventually I got the parse time for the same file down to 75 ms, which I decided was good enough, since most shaders won't be nearly that long (I think), and I have some plans to help split up and reduce the load later down the road.

### Colorizing

At this point I started looking at example code and putting together my colorizer. The code is pretty straightforward, but getting it all to work the way you want is far from it. Since your extension is running inside Visual Studio, it's extremely hard to debug when something goes wrong, and a lot of the extensibility interfaces aren't documented that well or are completely unused anywhere else on the web, making figuring a lot of it out a painstaking process of trial and error. Eventually though, I was rewarded for my efforts:

I'm still thinking about and tweaking the colors and tagging, but the basics are all there.

### Intellisense

Colorizing is nice, but I want really like Intellisense too, so I started work on that. Visual Studio actually recognizes four different types of Intellisense:

• QuickInfo - For tooltips when you hover over a variable, function, or type.

• Statement Completion - When you start typing, shows a list of possible symbols or keywords for the given scope.

• Parameter Info - Shows function signatures and overloads when you open a function's parenthesis.

• Method Info - Shows a list of members when you press the period key to access a member.

I've only worked on the first one so far, but getting QuickInfo to work was relatively painless, at least from the extension point-of-view:

Of course, getting the symbol information to display is another matter entirely. I still need to go through and collect user symbols and push them onto a the correct scope, and display them when necessary. For now, I've created an XML file that describes intrinsic symbols and gives them the nice info you can see in the image above. A snippet of that file:

"1.0" encoding="utf-8"?>      "Submits an error message to the information queue and terminates the current draw or dispatch call being executed." profile="4"/>    "Returns the absolute value of the specified value." profile="1" subset="vs_1_1 and ps_1_4">              "x" type="scalar,vector,matrix;float,int">The specified value.        paramref:x              "Returns the arccosine of the specified value." profile="1" subset="vs_1_1 only">              "x" type="scalar,vector,matrix;float">The specified value. Each component should be a floating-point value within the range of -1 to 1.        paramref:x              "Determines if all components of the specified value are non-zero." profile="1" subset="vs_1_1 and ps_1_4">              "x" type="scalar,vector,matrix;float,int,bool">The specified value.        bool              "Blocks execution of all threads in a group until all memory accesses have been completed." profile="5" subset="Compute shader only"/>    this call." profile="5" subset="Compute shader only"/>    "Determines if any components of the specified value are non-zero." profile="1" subset="vs_1_1 and ps_1_4">              "x" type="scalar,vector,matrix;float,int,bool">The specified value.        bool              "Reinterprets a cast value into a double." profile="5">              "lowbits" type="uint#1">The input low bits of the double.        "highbits" type="uint#1">The input high bits of the double.        double#1

Next I'll be working on statement completion. The parser already has a list of expected terms at any given point in the parsing process, so I'll just have to find a way to plug into that list and filter out anything unnecessary (such as delimiters).

### Other Cool Stuff

There are a bunch of other things that go into building a language service, some of which I've played around with. The first is bracket matching, which ended up fairly easy to do since the parse is tracking this information already. Another feature was real-time error detection, which displays as squigglies in the editor view and in the Error List pane. Instead of relying on my parser to provide syntax errors, I opted to use the D3DCompiler to simply compile the source on a separate thread and parse the error messages returned. This lets me get warnings and semantic errors along with syntax and grammar errors, with less work for me, so I count that as a win.

Other ideas I had but haven't even started looking into yet: formatting, auto-indenting, supporting the navigation bar, symbol browser, showing real-time disassembly of the code, and even perhaps integrating a visual shader designer, although that would be a long way down the road at this point.

### The End

I still have a long way to go before I have something I can release for people to try out, but I think I've gotten a good start on this project. I'm hoping this will be useful for a lot of people, not just me. Anyone out there think they would use something like this? Someone already inquired about GLSL and Cg support, which should be possible in the future by simply plugging in a new grammar and tweaking a few things, so we'll see how that plays out.

## SlimDX Direct3D 11 Tutorials

I was a bit bored this weekend, so I started work on a few tutorials. I'm looking for feedback before I link them into the main site:

Basic Window
Device Creation

I've got another pretty much finished. Any feedback at all is appreciated.

## PIX Managed x64

PIX is a great tool for profiling and debugging Direct3D applications. A lot of developers swear by it. There is slight problem though: the 64-bit version of PIX doesn't handle managed applications properly. When you go to launch a 64-bit managed app, it complains that it cannot start the process. We've reported the bug before, but there hasn't been much enthusiasm in seeing it fixed (understandable since MS doesn't have a 64-bit managed offering of DirectX at the moment). This can be quite annoying for users of SlimDX though (and possibly the Windows API Code Pack as well). So I set out to build a workaround.

My idea was thus: to build a native x64 application that would in turn launch the x64 managed target and ensure that PIX hooked up with it properly. It turned out to be simpler than I thought. Microsoft provides the CLR Hosting API, which lets you hook into and run managed code inside your process. To start with, I tried simply initializing the CLR, cleaning it up again, and exiting, expecting PIX to fail. Surprisingly, it did not. It makes me wonder what exactly the default CLR loader is doing differently.

Starting the CLR is easy:
ICLRRuntimeHost *host = 0;CorBindToRuntimeEx(L"v2.0.50727", L"svr", STARTUP_CONCURRENT_GC, CLSID_CLRRuntimeHost,    IID_ICLRRuntimeHost, reinterpret_cast(&host));host->Start();

Unfortunately, executing a managed program isn't so easy. The hosting API only provides one main function to launch a managed executable in-process, and the signature requirements for the target method are rather odd. Instead of accepting one of the signatures of Main() as the target, you have to provide an entry point that takes a single string parameter and returns an integer value. This of course matches none of the possible C# entry points, and I didn't want users to have to modify their programs to make them compatible with my fix.

The solution I used was to insert an additional C# DLL in between my native loader and the target application. This assembly provides the required entry point function, taking the path to the target app as the single parameter. It then runs the executable within the same appdomain, ensuring that we keep this all in a single process so as not to confus PIX:
public static int Start(string path){    return AppDomain.CurrentDomain.ExecuteAssembly(path);}

Now we can call into Start from our native loader and forward a command line argument representing the path of the target application.
DWORD ret;host->ExecuteInDefaultAppDomain(path.c_str(), L"PIXPlug.Shim.AppRunner", L"Start",    lexical_cast(argv[1]).c_str(), &ret);

Opening PIX, I provide my native launcher as the target, and pass the path to my managed application as the command line argument. Click launch and... success! The managed application launches and starts rendering. PIX hooks in and starts giving me debugging data. I've successfully tricked PIX into thinking this managed x64 application is actually running natively.

This post contains the answers to my C# quiz. If you haven't taken it yet, go do so now.

First, I need to say that the point of these types of quizzes isn't to explore commonly used code or exhibit best practices. It's just interesting, at least to me, to see the murky corners of a commonly used language or API. With that said, here are the answers to the C# quiz I posted a few days ago.

## Question 1

I am particularly fond of this question, as it has several layers of trickiness to it. Many questions in quizzes like these can be deciphered based upon the very nature of the fact that they're in the quiz to begin with. Naturally, people first thought "well, normally I would say that wouldn't compile, but since this is an esoteric C# quiz, it must compile." The answers were split down the middle, but I didn't find anyone who got the answer right for the right reasons.

The snippet as posted doesn't compile. The error, however, does not deal with identifier names, the @ symbol, or the Unicode escape sequence. The error stems from a small line in the C# standard:
Quote:
 Delimited comments (the /* */ style of comments) are not permitted on source lines containing pre-processing directives.

I'm not sure exactly why this is, as it's not true in C++, but once that innocuous looking comment is taken out the snippet will compile and run.

Next, we look at the preprocessor directive there. Using C# keywords is valid as preprocessor tokens, so the class is compiled into the program. The @ symbol allows keywords to be used as identifiers, but it doesn't contribute to the identifier's actual name. Several people comment on this as being a rather contrived example, and while I agree that I got a little carried away sticking it on everything, I have run into this symbol in the wild when interoping with a C++/CLI project that used "object" as a parameter name, so knowing this isn't completely useless.

Next, there is the function containing the Unicode escape sequence in its name. Unicode is indeed allowed in C# source, and C# doesn't permit Unicode in keywords, so the resulting syntax must be an identifier for a function called int. So we now have two functions called "int". The real heart of this question boils down to function overloading rules. Many people missed this part completely. Both methods will contribute to the set of candidate function members for possible overloading, as the integer literal '5' can be implicitly converted to both short and byte. The question is, which one?

Considering two conversion targets T1 and T2, T1 is a better conversion target than T2 if an implicit conversion from T1 to T2 exists, and no implicit conversion from T2 to T1 exists. Since one does exist from byte to short but not vice-versa, the method taking a byte is selected as the overload, giving an output of '4'.

## Question 2

The answers to this question were rather intuitive once you realized that there was both a type and a variable named A, but it exposes an interesting part of the language. From the standard:
Quote:
 The grammar for a cast-expression leads to certain syntactic ambiguities. For example, the expression (x)-y could either be interpreted as a cast-expression (a cast of -y to type x) or as an additive-expression combined with a parenthesized-expression (which computes the value x - y).

For that reason, a rule was introduced that requires that for the expression to be considered a cast, the token immediately following the closing parenthesis must be an opening parenthesis or an identifier. Since in the case of (x)-y the immediate token is a '-', it resolves to an expression instead of a cast.

The output for this question is therefore -1 and then -5.

## Question 3

A lot of people got tripped up on the assignment to this in this question. In a value type constructor, this is treated identically to an out parameter of the function, and is therefore legally assignable. In fact, you are required to do so in one form or another in order for a value type constructor to be legal. Usually though, code just assigns the individual members rather than the whole instance at once.

Moving on, we come to the behavior of static initialization. For all value types, the static field is first default initialized to all zeroes. The initializer is run at some implementation-defined time prior to the first use of that field. Therefore, we enter the Foo() constructor and assign 2 to the I field, and then proceed to assign the static field F to this instance. In either case, this will wipe out the assignment we just made, so the output should never be '2'. Once we access F, we are guaranteed that the initializer has run, so we will take on whatever value F currently has for I.

When the initializer for F runs, it calls into the Foo() constructor and assigns 5 to the I field. But then it goes and takes the value of F, which has been default initialized to all zeroes, and overwrites it. When the initializer completes, F has been initialized to all zeroes. Therefore, the output of the program is neither '2' nor '5', but '0'.

## Question 4

This question is similar to Question 2 in that the output becomes easy to guess when you see the code in a simplified form like this, and also because it addresses another case of ambiguity in the C# specification that warranted explicit rules to resolve.

To resolve this, the parser looks at the token immediately proceeding the closing angle bracket '>'. If it's one of "( ) ] } : ; , . ? == !=" then it is evaluated as a type argument list (aka. a generic function), otherwise it is not, even if there is no other possible parse of the sequence of tokens. In the first case of our example, there is a '(' token proceeding the closing angle bracket, so it is parsed as a generic function call. In the second, there is a '-' operator, which is not in the above list and therefore results in an expression instead. The resulting output is therefore -7, False True.

## Question 5

This is another question that relies on the rules of static initialization. Besides the rules mentioned previously for question 3, this question also relies on knowing that the initialization order for static fields is the order in which they are defined textually in the source file. Some people thought that the fields would be initialized in order of use, but this is not the case. Once again we rely on having the fields zero-initialized, and then a is initialized first to b (which is 0) + 1, giving it a value of one. Final output for this question is therefore a = 1, b = 2.

That's it for the quiz. I hope you learned something new from it.

## C# Quiz

In the spirit of Washu's C++ quizzes, I've decided to try a few based on C#. While not as heinous as C++, C# still has plenty of little quirks to make things interesting. Resist the temptation to break out your compiler to cheat, and don't rely on the answer of someone else as there is no guarantee that they're correct. For reference, I'm working off the C# 4.0 Specification.

Question 1: Does the following program compile? If not, why? If so, or if compilation errors are corrected, what is its output?
#define classnamespace Test{#if class    /* should compile? */    class @class    {        public int @int(short @int)        {            return (int)@int + 2;        }        public int \u0069nt(byte o)        {            return 4;        }    }#endif    class Program    {        static void Main()        {            @class @int = new @class();            System.Console.WriteLine(@int.@int(5));        }    }}

Question 2: What is the output of the following program?
class A{    int i;    public A(int i)    {        this.i = i;    }    public static explicit operator A(int i)    {        return new A(i);    }    public override string ToString()    {        return i.ToString();    }}class Program{    static void Main()    {        int A = 4;        System.Console.WriteLine((A)-5);        System.Console.WriteLine((A)(-5));    }}

Question 3: Does this program compile? If so, what is its output?
struct Foo{    static readonly Foo F = new Foo(5);    public int I;    public Foo(int i)    {        I = i;        this = F;    }}class Program{    static void Main()    {        Foo f = new Foo(2);        System.Console.WriteLine(f.I);    }}

Question 4: What is the output of the following program?
class Program{    class A { }    class B { }    static void Main()    {        int A = 10;        int B = 20;        int G = 15;        F(G < A, B > (-7));        F(G < A, B > -7);    }    static void F(int i)    {        System.Console.WriteLine(i);    }    static void F(bool b1, bool b2)    {        System.Console.WriteLine("{0} {1}", b1, b2);    }    static int G(int v)    {        return v;    }}

Question 5: Is the following defined behavior? If so, what is its output?
class Test{    static int a = b + 1;    static int b = a + 1;    static void Main()     {         System.Console.WriteLine("a = {0}, b = {1}", a, b);     }}

Have at it!

## Matrix Projection Poll

Currently in SlimDX our Matrix class exposes a large set of methods to create projection matrices. This closely mirrors the methods provided by D3DX, which was the original choice behind the design. While moving everything to C# and away from D3DX, I have a chance now to do some refactoring, so I thought I'd try it out and see what came of it.

This is the set of methods that we have currently:
public static void OrthoLH(float width, float height, float znear, float zfar, out Matrix result){}public static Matrix OrthoLH(float width, float height, float znear, float zfar){}public static void OrthoRH(float width, float height, float znear, float zfar, out Matrix result){}public static Matrix OrthoRH(float width, float height, float znear, float zfar){}public static void OrthoOffCenterLH(float left, float right, float bottom, float top, float znear, float zfar, out Matrix result){}public static Matrix OrthoOffCenterLH(float left, float right, float bottom, float top, float znear, float zfar){}public static void OrthoOffCenterRH(float left, float right, float bottom, float top, float znear, float zfar, out Matrix result){}public static Matrix OrthoOffCenterRH(float left, float right, float bottom, float top, float znear, float zfar){}public static void PerspectiveLH(float width, float height, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveLH(float width, float height, float znear, float zfar){}public static void PerspectiveRH(float width, float height, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveRH(float width, float height, float znear, float zfar){}public static void PerspectiveFovLH(float fov, float aspect, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveFovLH(float fov, float aspect, float znear, float zfar){}public static void PerspectiveFovRH(float fov, float aspect, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveFovRH(float fov, float aspect, float znear, float zfar){}public static void PerspectiveOffCenterLH(float left, float right, float bottom, float top, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveOffCenterLH(float left, float right, float bottom, float top, float znear, float zfar){}public static void PerspectiveOffCenterRH(float left, float right, float bottom, float top, float znear, float zfar, out Matrix result){}public static Matrix PerspectiveOffCenterRH(float left, float right, float bottom, float top, float znear, float zfar){}

As you can see, there are quite a few. We double each method to provide a ref version for those who want very high speeds and a normal version for people who don't care as much.

My new proposed version looks like this:
public static Matrix PerspectiveFov(Handedness handedness, float fov, float aspect, float znear, float zfar){    float yScale = (float)(1.0 / Math.Tan(fov / 2.0f));    float xScale = yScale / aspect;    float width = 2 * znear / xScale;    float height = 2 * znear / yScale;    return Projection(ProjectionType.Perspective, handedness, -width / 2.0f, width / 2.0f, -height / 2.0f, height / 2.0f, znear, zfar);}public static Matrix Perspective(Handedness handedness, float width, float height, float znear, float zfar){    return Projection(ProjectionType.Perspective, handedness, -width / 2.0f, width / 2.0f, -height / 2.0f, height / 2.0f, znear, zfar);}public static Matrix Orthographic(Handedness handedness, float width, float height, float znear, float zfar){    return Projection(ProjectionType.Orthographic, handedness, -width / 2.0f, width / 2.0f, -height / 2.0f, height / 2.0f, znear, zfar);}public static Matrix Projection(ProjectionType type, Handedness handedness, float left, float right, float bottom, float top, float znear, float zfar){    Matrix result = new Matrix();    result.M11 = 2.0f / (right - left);    result.M22 = 2.0f / (top - bottom);    result.M33 = 1.0f / (zfar - znear);    result.M43 = znear / (znear - zfar);    if (type == ProjectionType.Orthographic)    {        result.M41 = (left + right) / (left - right);        result.M42 = (top + bottom) / (bottom - top);        result.M44 = left;    }    else    {        result.M11 *= znear;        result.M22 *= znear;        result.M33 *= -zfar;        result.M43 *= zfar;        result.M31 = (left + right) / (left - right);        result.M32 = (top + bottom) / (bottom - top);        result.M34 = 1.0f;    }    if (handedness == Handedness.Right)    {        result.M31 *= -1.0f;        result.M32 *= -1.0f;        result.M33 *= -1.0f;        result.M34 *= -1.0f;    }    return result;}

Note that I haven't listed the ref overloads here. With ref overloads, the new method would have 8 total functions, whereas the old method has 20. Now, ignoring the actual creation of the matrix in there, since I haven't quite tested it yet, which design do you think is better? Here is my thought process so far:

First, the pros. The new method has drastically less functions to maintain, and everything boils down to the one actual method for implementation. Additionally, it better follows the design guidelines of .NET by not encoding functional information into the name and instead allowing users to specify it via enum. I find it to be a slightly cleaner method over all.

Now the cons. The new method is slightly less optimal due to the extra branches occurring. This is less bad than it seems, however, since creating projection matrices is never done more than several times over the course of a frame, and usually much less than that, so it's unlikely to make any sort of impact on performance. If this were determined to be a huge issue, I have a slightly more hackish version of Projection that avoids the branches by making tricky use of the enum values [grin] Also, the new version breaks all existing code using SlimDX, since we'd be relying on these slightly new names. As said before, we have a new "2.0" version coming that will encapsulate all of these breaking changes, but it could still be considered an issue.

Ultimately the pros and cons nearly balance each other, and I think it comes down to a style issue, so I'm trying to take a poll here. Which do you prefer? I've talked to several people already, and the responses have been all over the board. Some people like the new methods and say the style is cleaner, some people want the old ones back, and a few people have said that the new method is better but they still prefer the old one due to stylistic reasons. How about you? What's your opinion?

## I'm all growed up

There's no doubt about it, SlimDX has finally caught up to DirectX in terms of features. Sure, there are still little things here and there, but for the most part we cover every major part of the DirectX SDK, as well as several secondary libraries deemed beneficial for multimedia and game development. Once our next release hits (probably around February, if our covert intel is correct), things should be pretty stable in D3D11 land as well. Of course there is still plenty of bug fixing, documenting, and sample writing to do, but you don't plan future library development based upon maintenance tasks like that.

So we started postulating as to future plans for our little brainchild which is now all growed up. We've gathered up a list of changes we'd like to make, big ones that mean breaking changes for just about all our users, which we had held off on before in favor of library stability. These changes will be part of what we are internally referring to as our "2.0" release. The SlimDX version number is already higher than that, but this will be the first major shift in the library since we came out beta several years ago.

What might this "2.0" release entail? Well for starters, Josh has been chomping at the bit to switch us over to using interfaces instead of concrete classes as much as possible. This facilitates unit testing not only for ourselves but for our clients as well. While this requires a huge set of changes across the library, users will face only mild breaking changes where we end up returning an interface where previously we had a concrete class, which will require an additional cast to fix.

Ever since things slowed down on SlimDX, the team has been branching out to other side projects. Promit started SlimTune, a free profiler for managed applications. Josh started an overhaul of the sample framework, and has only recently started work on SlimBuffer, which will contain a refactored version of the DataStream functionality currently in SlimDX. "2.0" will depend on this new library for its native memory management needs.

Washu and I started work on SlimGen, which is a tool that injects ASM into a managed assembly at runtime, allowing us to use hand-optimized SIMD instructions directly from managed code with no interop overhead. This will play hand-in-hand with SlimMath, which is an all-managed implementation of the math functionality currently living within SlimDX. SlimDX "2.0" will rely on this library, and external projects can make use of the math functionality without needing the rest of SlimDX, or indeed even needing to be on Windows at all.

Finally, mesh and model handling is a common complaint I see across many different APIs, so I started work on SlimMesh, which will handle the loading of 3D models from several different formats in an API agnostic manner. This will include D3D9, D3D10, and D3D11 renderers via SlimDX to make it usable out of the box, but I suspect users of libraries like OpenTK might find it useful as well.

So yes, we're all still hard at work. The SlimDX group is rapidly becoming a multimedia middle-ware group [grin] SlimDX might be all growed up, but the rest of us are just getting started.

PS. Promit was a genius picking out the "Slim" name for SlimDX. It's a great branding tool and easy to slap on to almost any project.

Most anyone who's written a windows application in .NET has seen the funny [STAThread] attribute adorning their Main function. The more curious of these folks have done a little digging and found that it has to do with "Advanced COM Magic" and left it at that. Others, not satisfied with magical explanations, have dug still deeper and found that it has to do with COM, and the two different, mutually exclusive ways it handles threading.

For the uninformed, COM objects can be accessed in two different ways: single threaded apartment and multithreaded apartment. Many components are designed to work in only one mode or the other, such as several Winforms GUI components. Because of this, the default program generated by Visual Studio attaches the [STAThread] attribute to tell the runtime to initialize the main thread as using single threaded apartment mode.

So far, so good. I'm writing a bit of managed glue code in C++/CLI that uses a COM component that requires single threaded apartment mode to run correctly. I check to make sure that my test application has the appropriate thread set, and then run my first test, with good results. Awash with the euphoria of victory, I walk away to go eat a brownie.

When I return, I make a few adjustments to my DLL, and run the test app again. Lo and behold, I fail with a cryptic HRESULT return code from CoCreateInstance, which is in charge of creating the necessary COM component. Frowning, I rollback my changes and try again. Still no good. This makes no sense, right?

After a lot of digging, I do a little test. I insert the following code right before CoCreateInstance:
ApartmentState state = Thread::CurrentThread->GetApartmentState();

I put a breakpoint there and run the debugger. Sure enough, the apartment state for thread is set to MTA (multithreaded apartment). That's odd. I'm sure it's set to STA. I check back to my Main method to be sure. Yep, it's still got the attribute on there. I insert my debugging code as the first line of Main. You can see the hilariousness for yourself here:

So apparently, the CLR is ignoring my STAThread attribute and initializing the main thread as multithreaded. Not very considerate if you ask me. On an off chance, I try running my application without the debugger. Success! We now have an STA thread. I try a few other configurations, each with mixed results.

I did a little Googling, and while there is some confirmation of this peculiar behavior, there was little on how to actually fix it. I fought with it for a bit before giving in and adding the following lines before the creation of the COM component:
if (Thread::CurrentThread->GetApartmentState() != ApartmentState::STA){    CoUninitialize();    CoInitialize(NULL);}

Compile and run without issues! It's a little heavy handed I'll admit, but I'm not really sure what else to do. Has anyone else ever experienced this behavior? It's really quite bizarre. Leave a comment below to let me know what you think!

## GDNet+

Whew, I was without GDNet+ coverage for a few hours there. A few hours of agony that is. Thankfully, it was a problem easily fixed by throwing money at it. All is right with the world.

EDIT: Ah! My avatar was gone. I think I might have to sue for emotional trauma.

## Math Performance

Ever since things settled down in SlimDX, we've been turning our attention outward towards new projects that we've been naming by humorously tacking on the "Slim" moniker. Josh has been working on both an SVN client and a home-made CLR, which right now is targeting the PSP. Promit has his SlimTune profiler project that you no doubt have seen, and I have my mini SlimLine line counter project.

Washu and I originally started a separate SlimMath project back in March, when the new XNAMath library was release that made heavy use of SIMD for blazing fast performance. We wanted to bring these performance gains to SlimDX, but it was obvious that the normal method of wrapping wouldn't suffice for these small and quick bits of code; the managed/native barrier was too great to cross. I came up with the idea to batch math calls and send them all across the marshaling barrier at once, therefore only paying the cost once in each direction. We started work on that, but even with this case, the costs paid were so large that you had to do a monstrous amount of work for it to be worthwhile.

We abandoned this idea, and SlimMath was left to die. Just a few weeks ago though, Washu came up with the idea of modifying the .NET assembly after it had been compiled with NGEN and injecting our own implementations of certain functions that used hand-written ASM. The idea immediately appealed to me, and we set off on such a project, dubbing it SlimGen. While Washu is no doubt preparing a series of blog posts on the subject (he's quite excited about this, something that should amaze and scare you at the same time), I just wanted to talk a little bit about how this could be used.

The way we see it, our SlimGen client program runs on the end user's computer, at install time. After the .NET target assembly (for example, SlimDX.dll) has been NGEN-ed and installed into the GAC, we run our program that takes compiled object files and inserts the raw code into the native assembly image. This has a few major benefits for us. First, you can avoid all interop costs by injecting the code directly. Second, you can make use of any instructions you wish in the raw code, which means that math methods can now take advantage of SIMD extensions. The third benefit, which is mostly derived from running the program at install time, is that it can customize the assembly based upon the extensions supported by the target processor. This means that if a chip is detected as having SSE4 support, we can inject our SSE4 version of a function directly, without any runtime checks.

This scales well too, since for functions where we don't care to provide a particular SSE4 version, we can simply downgrade to the next available version, or if we don't provide any optimized ASM for a method, simple go on using the one provided in the assembly. This new tool, once finished and properly tested, should allow .NET library and application developers to squeeze every possible performance benefit out of using native code, while still retaining many of the niceties and usability benefits of managed development.

## SlimLine

Promit announced the SlimTune Profiler today on his blog and it's made me antsy to formally announce my own offering to the fledgling line of software from the SlimDX Group. It's a line counter / software metrics tool that I've dubbed SlimLine, in keeping with the "Slim" moniker. It's not as flashy as a profiler, but still useful nonetheless.

I'm sure many of you have run across the startling lack of basic line counting and source statistics in Visual Studio. There is also a surprising lack of free tools out there to fill the gap. Project Line Counter seems to be thrown around every once in a while, but a quick glance at it shows it to be almost dead at this point, without even support for Visual Studio 2008. Seeing this, I quickly threw together a command line program that would parse a Visual Studio solution and do some counting. It was quick and dirty, but it worked, and after posting it here in my journal and on IRC, I found it was spreading amongst people who liked the simplicity and usefulness it provided.

For the official SlimLine, things are going to be taken to the next level. The entire source code counting code will be pushed into its own library that can be consumed by any other .NET program. Utilizing this library will be three official front ends: a command line program, a graphical Windows app, and a Visual Studio add-in. Initially, it will only be doing some simple line and file counting by project and filter, but I hope to eventually expand into other software metrics, such as cyclomatic complexity.

I like the idea of taking our "Slim" brand and applying it to other areas of development, and the line counter stuff I already had seemed like a natural fit. Like I said earlier, it's not as cool as Promit's profiler project, but it's still useful and some of you may want to keep an eye out for it in the next few months.

## Allocations, Revisited

Some of you may remember my entry on the custom allocator we wrote to try to take advantage of stack allocation for small temporary arrays. Yes, it was Premature Optimization, and as no good deed goes unpunished, we botched the job and caused all sorts of problems. We ended up scrapping the whole thing and just using std::vector for our temporary array needs.

A few weeks ago, we got an issue filed on our tracker that claimed that a particular shader method in D3D10, which was called many times per frame, was spending 80% of its time diddling with std::vector. I was skeptical to be sure, but a few quick tests proved that at least a good portion of the method was in fact being eaten up. I figured it high time I resurrected our old stack allocation code, but this time I had the benefit of hind sight to help guide me along.

Our old attempt had basically involved using a custom allocator for std::vector, which seemed good on the surface until we realized that allocated memory from the stack inside an allocator wasn't going to be of much use outside of it. To that end, I began thinking of ways that we could reliably allocate memory on the stack of the calling method, but still wrap it all up nicely so that the user didn't have to worry about cleanup or any other nonsense. I hit upon the idea of using a macro that would discreetly allocate a chunk from the stack and then forward the call on to the actual stack_array constructor. Since the macro is a simple text replacement, the actual stack allocation call happens in the calling function, which is exactly where it needs to happen.

You can see the entire contents of the stack_array class here:
#define stackalloc(type, length) stack_array::from_stack_ptr(reinterpret_cast(_malloca(sizeof(type) * length)), length)templatestruct stack_array_ref{	explicit stack_array_ref(T *right, size_t length, bool on_stack)		:	ptr(right),			len(length),			on_stack(on_stack)	{	}	T *ptr;	size_t len;	bool on_stack;};templateclass stack_array{private:	T* ptr;	size_t len;	bool on_stack;	explicit stack_array(T* memory, size_t length) throw()		:	len(length),			ptr(memory),			on_stack(true)	{	}public:	explicit stack_array(size_t length = 0) throw()		:	len(length),			ptr(new T[length]),			on_stack(false)	{	}	stack_array(stack_array& right) throw()		:	ptr(right.ptr),			len(right.len),			on_stack(right.on_stack)	{		right.ptr = NULL;		right.len = 0;		right.on_stack = false;	}	stack_array(stack_array_ref right) throw()	{		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		right.ptr = NULL;	}	~stack_array()	{		if (on_stack)			_freea(ptr);		else			delete[] ptr;	}	static stack_array from_stack_ptr(T* memory, size_t length)	{		return stack_array(memory, length);	}	operator stack_array_ref() throw()	{		stack_array_ref ans(ptr, len, on_stack);		ptr = NULL;		len = 0;		on_stack = false;		return ans;	}	stack_array& operator = (stack_array& right) throw()	{		if (right.ptr != ptr)		{			if (on_stack)				_freea(ptr);			else				delete[] ptr;		}		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		right.ptr = NULL;		right.len = 0;		right.on_stack = false;		return *this;	}	stack_array& operator = (stack_array_ref right) throw()	{		if (right.ptr != ptr)		{			if (on_stack)				_freea(ptr);			else				delete[] ptr;		}		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		return *this;	}	const T* get() const	{		return ptr;	}	T* get() throw()	{		return ptr;	}	size_t size() const throw()	{		return len;	}	T& operator [] (size_t index)	{		return ptr[index];	}	const T& operator [] (size_t index) const	{		return ptr[index];	}};

It's a very lightweight template class that really only exists to hold temporary values while we marshal between .NET and DirectX. Notice the stackalloc macro, which is where the magic happens. If the user fails to use this macro to set up the array, it will go ahead and use a standard new/delete, which means we don't get unspeakable errors from a simple typo. Here's an example of using it:
stack_array d3dpp = stackalloc( D3DPRESENT_PARAMETERS, presentParameters->Length );

I'm pretty happy with the way it turned. Benchmarks place stack_array at around 3x faster than std::vector, and even slightly faster than raw memory allocation, so we've definitely done some good work there. I'm not sure why std::vector is so slow in this case; I've turned off every security and debugging feature I can think of; maybe there's some quirk when it comes to using it in C++/CLI.

## Minimal Initialization

I was fooling around with SlimDX the other day and decided to try to make the nicest looking minimal initialization program that I could. We've been adding little utility classes to help make things smoother, so the end result is, in my opinion, quite pretty to look at.

Direct3D9:
using System;using System.Drawing;using SlimDX;using SlimDX.Direct3D9;using SlimDX.Windows;namespace Sample{    static class Program    {        [STAThread]        static void Main()        {            var form = new RenderForm("SlimDX Comparison");            var device = new Device(new Direct3D(), 0, DeviceType.Hardware, form.Handle, CreateFlags.HardwareVertexProcessing, new PresentParameters()            {                BackBufferWidth = form.ClientSize.Width,                BackBufferHeight = form.ClientSize.Height            });            MessagePump.Run(form, () =>            {                device.Clear(ClearFlags.Target | ClearFlags.ZBuffer, Color.Black, 1.0f, 0);                device.BeginScene();                device.EndScene();                device.Present();            });            foreach (var item in ObjectTable.Objects)                item.Dispose();        }    }}

The C++ equivalent is, as you might imagine, quite disgusting by comparison:
#include #include #pragma comment(lib, "d3d9.lib")LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam){	switch (message)	{	case WM_DESTROY:		PostQuitMessage(0);		return 0;	case WM_PAINT:		ValidateRect(hWnd, 0);		return 0;	}	return DefWindowProc(hWnd, message, wParam, lParam);}int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd){	WNDCLASSEX wcex;	wcex.cbSize = sizeof(WNDCLASSEX);	wcex.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;	wcex.lpfnWndProc = WndProc;	wcex.cbClsExtra = 0;	wcex.cbWndExtra = 0;	wcex.hInstance = hInstance;	wcex.hIcon = LoadIcon(hInstance, IDI_APPLICATION);	wcex.hCursor = LoadCursor(hInstance, IDC_ARROW);	wcex.hbrBackground = reinterpret_cast(GetStockObject(WHITE_BRUSH));	wcex.lpszMenuName = NULL;	wcex.lpszClassName = L"TestWindowClass";	wcex.hIconSm = wcex.hIcon;	RegisterClassEx(&wcex);	HWND hWnd = CreateWindow(L"TestWindowClass", L"SlimDX Comparison", WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, 800, 600, 0, 0, hInstance, 0);	RECT rect = {0, 0, 800, 600};	AdjustWindowRect(&rect, GetWindowLong(hWnd, GWL_STYLE), FALSE);	SetWindowPos(hWnd, 0, 0, 0, rect.right - rect.left, rect.bottom - rect.top, SWP_NOZORDER | SWP_NOMOVE);	ShowWindow(hWnd, SW_SHOW);	UpdateWindow(hWnd);	LPDIRECT3D9 d3d = Direct3DCreate9(D3D_SDK_VERSION);	if (!d3d)	{		MessageBox(NULL, L"Direct3DCreate9 - Failed", 0, 0);		return 0;	}	D3DPRESENT_PARAMETERS pp;	pp.BackBufferCount = 1;	pp.BackBufferFormat = D3DFMT_X8R8G8B8;	pp.BackBufferWidth = 800;	pp.BackBufferHeight = 600;	pp.MultiSampleType = D3DMULTISAMPLE_NONE;	pp.MultiSampleQuality = 0;	pp.Windowed = TRUE;	pp.SwapEffect = D3DSWAPEFFECT_DISCARD;	pp.hDeviceWindow = hWnd;	pp.EnableAutoDepthStencil = TRUE;	pp.AutoDepthStencilFormat = D3DFMT_D24X8;	pp.Flags = 0;	pp.FullScreen_RefreshRateInHz = 0;	pp.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;	LPDIRECT3DDEVICE9 device;	HRESULT hr = d3d->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, hWnd, D3DCREATE_HARDWARE_VERTEXPROCESSING, &pp, &device);	if (FAILED(hr))	{		d3d->Release();		MessageBox(NULL, L"CreateDevice - Failed", 0, 0);		return 0;	}	MSG msg;	while (1)	{		if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))		{			if (msg.message == WM_QUIT)				break;			TranslateMessage(&msg);			DispatchMessage (&msg);		}		else		{			device->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_ARGB(255, 0, 0, 0), 1.0f, 0);			device->BeginScene();			device->EndScene();			device->Present(0, 0, 0, 0);		}	}	device->Release();	d3d->Release();	return msg.wParam;}

This is one using our latest addition of Direct3D11. The D3D10 version looks quite similar to this.
using System;using SlimDX;using SlimDX.Direct3D11;using SlimDX.DXGI;using SlimDX.Windows;namespace Demo{    static class Program    {        [STAThread]        static void Main()        {            var form = new RenderForm();            var desc = new SwapChainDescription()            {                BufferCount = 1,                ModeDescription = new ModeDescription(form.ClientSize.Width, form.ClientSize.Height, new Rational(60, 1), Format.B8G8R8A8_UNorm),                IsWindowed = true,                OutputHandle = form.Handle,                SampleDescription = new SampleDescription(1, 0),                SwapEffect = SwapEffect.Discard,                Usage = Usage.RenderTargetOutput            };            SwapChain swapChain;            SlimDX.Direct3D11.Device device;            SlimDX.Direct3D11.Device.CreateWithSwapChain(null, DriverType.Reference, DeviceCreationFlags.Debug, desc, out device, out swapChain);            var context = new DeviceContext(device);            var renderView = new RenderTargetView(device, swapChain.GetBuffer2D>(0));            context.OutputMerger.SetTargets(renderView);            context.Rasterizer.SetViewports(new Viewport(0, 0, form.ClientSize.Width, form.ClientSize.Height));            MessagePump.Run(form, () =>            {                context.ClearRenderTargetView(renderView, new Color4(0.0f, 0.5f, 1.0f));                swapChain.Present(0, PresentFlags.None);            });            foreach (var item in ObjectTable.Objects)                item.Dispose();        }    }}

I've been tossing around the idea of making these into project templates for Visual Studio. Thoughts?

## MDX can still make me laugh

MDX has been dead for quite some time now, and these days most people recognize the futility of trying to use them for any new project. As a developer of SlimDX, I find myself still glancing at the internals of MDX from time to time, especially when contemplating how to wrap a particularly cumbersome portion of the DirectX API. That's what I was doing today when I ran across a real gem that I couldn't resist sharing with GDNet journal land.

DirectInput exposes an effect interface for dealing with force feedback effects. Each effect has a set of additional type-specific parameters, which turn out to be rather difficult to encapsulate correctly for consumption by managed code. Just as a quick primer, in native DirectInput IDirectInputEffect exposes a GetParameters method to get the effect parameters. The returned parameters structure contains a pointer to the extra type-specific parameters, along with a byte size. It is assumed that the C++ programmer will know the correct type for this data and then cast the pointer appropriately. In the managed world, this isn't so easy.

Ideally, the managed wrapper would construct the appropriate type and then return that to the user, as playing with pointers isn't quite kosher in the managed world. Unfortunately, the returned parameters contain no identifier to specify which type of data is being represented. I scratched my head at this one for a bit, and then took a look at how MDX handled the situation. I was quite surprised to find this little snippet of code hiding in the bowels of MDX:

internal static unsafe void ToManaged(ref Effect ret,     DIEFFECT modopt(IsConstModifier)* src){    if ((ret == 0) || (src == null))    {        throw new ArgumentNullException();    }    ret.m_Flags = *((EffectFlags*) (src + 4));    ret.m_Duration = *((int*) (src + 8));    ret.m_SamplePeriod = *((int*) (src + 12));    ret.m_Gain = *((int*) (src + 0x10));    ret.m_TriggerButton = *((int*) (src + 20));    ret.m_TriggerRepeatInterval = *((int*) (src + 0x18));    ret.m_StartDelay = *((int*) (src + 0x34));    if (*(((int*) (src + 0x20))) != 0)    {        int[] numArray2 = new int[*((int*) (src + 0x1c))];        numArray2.Initialize();        ret.m_Axes = numArray2;        volatile ref int pinned numRef2 = (volatile ref int) &(ret.m_Axes[0]);        memcpy(numRef2, *(((int*) (src + 0x20))), (*(((int*) (src + 0x1c)))             << 2));        numRef2 = 0;    }    if (*(((int*) (src + 0x24))) != 0)    {        int[] numArray = new int[*((int*) (src + 0x1c))];        numArray.Initialize();        ret.m_Direction = numArray;        volatile ref int pinned numRef = (volatile ref int)            &(ret.m_Direction[0]);        memcpy(numRef, *(((int*) (src + 0x24))), (*(((int*) (src + 0x1c)))             << 2));        numRef = 0;    }    uint num2 = *((uint*) (src + 40));    if ((num2 != 0) && (((num2[4] != 0) || (num2[8] != 0)) ||         ((num2[12] != 0) || (num2[0x10] != 0))))    {        ret.m_UsesEnvelope = true;        Envelope.ToManaged(ref ret.EnvelopeStruct, *((DIENVELOPE**)            (src + 40)));    }    if (0x10 == *(((int*) (src + 0x2c))))    {        DICUSTOMFORCE* dicustomforcePtr = *((DICUSTOMFORCE**) (src + 0x30));        int num6 = *(((int*) (dicustomforcePtr + 8))) << 2;        if (((IsBadReadPtr(*((void modopt(IsConstModifier)**) (dicustomforcePtr + 12)), (uint) num6) == 0)               && (IsBadWritePtr(*((void**) (dicustomforcePtr + 12)),                   (uint) num6) == 0)) &&              (*(((int*) (dicustomforcePtr + 8))) > 0))        {            CustomForce.ToManaged(ref ret.CustomStruct, dicustomforcePtr);            ret.m_EffType = EffectType.CustomForce;            return;        }    }    uint num = *((uint*) (src + 0x2c));    if (0x10 == num)    {        Periodic.ToManaged(ref ret.Periodic, *((DIPERIODIC**) (src + 0x30)));        ret.m_EffType = EffectType.Periodic;    }    else if (4 == num)    {        ConstantForce.ToManaged(ref ret.Constant, *((DICONSTANTFORCE**) (src + 0x30)));        ret.m_EffType = EffectType.ConstantForce;    }    else if (8 == num)    {        RampForce.ToManaged(ref ret.RampStruct, *((DIRAMPFORCE**) (src + 0x30)));        ret.m_EffType = EffectType.RampForce;    }    else if ((num != 0) && ((num % 0x18) == 0))    {        int num5 = (int) (num / 0x18);        Condition[] conditionArray = new Condition[num5];        conditionArray.Initialize();        ret.ConditionStruct = conditionArray;        ret.m_EffType = EffectType.Condition;        int index = 0;        if (0 < num5)        {            DIEFFECT modopt(IsConstModifier)* dieffectPtr = src + 0x30;            int num4 = 0;            do            {                Condition.ToManaged(ref ret.ConditionStruct[index], (DICONDITION*) (num4 + *(((int*) dieffectPtr))));                index++;                num4 += 0x18;            }            while (index < num5);        }    }}

Yes, it's ugly, but a lot of that is due to the compiler having stripped out constant and structure accesses. That's not really the point here though. What's really crazy is how this snippet of code is determining the correct type for the extra type-specific parameters. Let's look more closely at the last portion of this method, cleaned up to remove compiler funkiness.

if (effect.Size == 16){    DICUSTOMFORCE* ptr = (DICUSTOMFORCE*)effect.TypeSpecificParams;    if (!IsBadReadPtr(ptr->ForceData) && !IsBadWritePtr(ptr->ForceData))        return CustomForce.ToManaged(ptr);}if (effect.Size == 16)    return PeriodicForce.ToManaged(effect.TypeSpecificParams);else if (effect.Size == 4)    return ConstantForce.ToManaged(effect.TypeSpecificParams);else if (effect.Size == 8)    return RampForce.ToManaged(effect.TypeSpecificParams);else if (effect.Size != 0 && (effect.Size % 24) == 0)    return ConditionArray.ToManaged(effect.TypeSpecificParams);

In case you don't quite understand the hilariousness that is this code, let me give you a little guidance. First, the function checks the number of bytes returned, and tries to match it up against the size of possible type-specific structures. As you can see, there are four possible force types. If the size doesn't match any of those, it checks to see if it's aligned on a Condition array boundary, in which case it interprets the data as an array of Condition structures. That's fairly hackish on it's own, but as you can see by that first if block, it gets better.

Unfortunately for the original MDX developers, both DIPERIODIC and DICUSTOMFORCE both have a size of 16 bytes, so how would you know which structure was being returned? Simple! Since DIPERIODIC contains four integers, and DICUSTOMFORCE contains three integers and a pointer, simply take a guess to see if the data at the correct offset could possibly be considered a valid pointer. If so, we'll pretend that we have a DICUSTOMFORCE and everything will be hunky dory!

The only question is, how do you know if a given set of bytes represents a pointer? Simple! Use the handy-dandy IsBadReadPtr and IsBadWritePtr functions to figure it out for you. For those of you in the know, IsBadXxxPtr should really be called CrashProgramRandomly or perhaps CorruptMemoryIfPossible. I haven't done much research on these two little functions, but before I'd even read the aforementioned article, I had already guessed that they weren't quite standard C++, if you catch my drift.

Now, this little... what did I call it... gem? that I've found here is quite out of scope for most normal applications, but I can't help but wonder if anyone had ever attempted to use this bit of functionality and found it horribly broken.

## Combinatorics

Such a cool word for a math concept. Anyways, I've been writing a Sudoku game lately, and I've been delving a little into set theory, with a little dash of combinatorics thrown in for good measure. I needed a method that would return all possible combinations of a given sequence, and the Enumerable class has finally failed me.

So here's my implementation of a combine function. Note that while this is similar to permutations, it's slightly different in that I only take a certain subset in each pass (controlled by the count parameter). I'm hoping someone, perhaps one of the functional guys, can show me a more elegant way of doing this, but for now it works. I think the use of the HashSet in the Filter function provides a performance boost , but I don't know for certain. HashSet.Contains is an O(1) operation, but the cost of creating a new HashSet every iteration may cancel out that performance gain.

/// /// Filters the specified sequence based upon the provided indices./// /// The type of the elements in the sequence./// The sequence./// The indices./// The filtered sequence.public static IEnumerable Filter(this IEnumerable sequence, IEnumerable<int> indices){    HashSet<int> hs = indices as HashSet<int>;    if (hs == null)        hs = new HashSet<int>(indices);    return sequence.Where((t, i) => hs.Contains(i));}/// /// Generates all combinations (not permutations) of a given sequence./// /// The type of the elements in the sequence./// The sequence./// The number of items to pick for each combination./// The sequence of combinations.public static IEnumerable> Combine(this IEnumerable sequence, int count){    List> results = new List>();    int[] indices = Enumerable.Range(0, count).ToArray();    int max = sequence.Count() - 1;    while (true)    {        results.Add(sequence.Filter(indices));        int n = count;        for (int i = indices.Length - 1; i >= 0; i--)        {            if (indices < max - (count - i - 1))            {                n = i;                break;            }        }        if (n == count)            break;        indices[n]++;        for (int i = n + 1; i < indices.Length; i++)            indices = indices[i - 1] + 1;    }    return results;}

Well, maybe it's not all the interesting, but it does have that new LINQ-y, extension method smell, and it actually worked the first time I wrote it, so I'm happy.

## Friend Assemblies

In lieu of the tutorial that I've been stalling on for a while now (Evil Steve's magnificent D3D9 tutorial has me depressed), I'll be discussing friend assemblies, and how something that works quite well with C# got absolutely and totally screwed up by C++/CLI.

OK, so, friend assemblies. What are they? Well, suppose you have a library that has a few public types that consumers of your library use, and few internal helpers that assist the others in performing the library's tasks. This is great, and how encapsulation is meant to work. Hide things from the end user that he doesn't need to see, both so that things are simpler and so that he won't inadvertently break something. What happens, though, if you want to write a another library that extends the functionality of the first library, but in a completely separate DLL? You run into problems if you need to use the internal utilities provided by library A. This is where friend assemblies come into play.

The CLI allows library designers to mark assemblies with an attribute called InternalsVisibleTo, which allows them to specify a list of assemblies that can see and use the internals of the marked assembly. This is straightforward, and is also tightly controlled, since the only person who can specify other friend assemblies is the original library developer. It's probably not ideal Object Oriented Programming, but that doesn't really bother me. Now, in C#, this functionality works great and as expected. Library A marks itself with this attribute, and lets the compiler know that library B can touch its private parts. All is right with the world.

Enter C++/CLI, stage left. When it came time to implement friend assemblies for C++/CLI, the language design team had a collective brain malfunction, because things get very crazy, very fast. Marking library A with InternalsVisibleTo still follows the same procedure as outlined for C#. The problem comes with the consumer assembly, which is library B in our case. It isn't enough for the C++/CLI compiler to see that A has declared B a friend. It wants more proof that the two assemblies really want to be friends. OK, I can live with that; maybe there's some strange goings-on under the hood in the C++/CLI compiler that require this. However, somebody on the design team thought it would be hilarious to require the reference to assembly A to be made using a #using statement, with a magical as_friend modifier stuck on the end. Apparently, the syntax team didn't get the memo, as this magical keyword isn't identified as such by Visual Studio.

OK I say to myself. It's weird, but I can deal with this. Just stick "#using "AssemblyA" as_friend" at the top of a source file and everything should work. Right? Wrong. When the compiler hits the #using statement, it says to itself (and me as well, how nice of it) "hmm, you've already added AssemblyA as a reference using the project settings, and it isn't marked as_friend there, so I'm just going to ignore the fact that you said as_friend here. I'm so helpful!" Of course, there's no way to mark an assembly as_friend from the Add Reference dialog, so you have to remove the assembly from the list and stick with the #using statement.

But wait, there's more. The #using statement, confusingly, only applies to the source file in which you've declared it. That's right. Even though your assembly B has told the compiler that you're using a reference to assembly A, it will still only let you use types from assembly A if you've made the special #using statement in the current source file. And since we've removed assembly A from the project-wide list of references, you now need to make sure that every required source file can see the #using statement.

But wait, there's more! The #using statement can only refer to ONE specific DLL, with a hard coded path. That's right. You need to hard code the path to the DLL. That means that if you want to refer to an assembly contained in the same solution, you need to refer to its actual physical location on disk. Also, you end up needing to use preprocessor magic to make sure you load the right assembly based up on the current platform and configuration of the project. In SlimDX, this preprocessor garbage ends up looking like this:

#ifdef X64#ifdef PUBLIC#using "../build/x64/Public/SlimDX.dll" as_friend#else#ifdef NDEBUG#using "../build/x64/Release/SlimDX.dll" as_friend#else#using "../build/x64/Debug/SlimDX.dll" as_friend#endif#endif#else#ifdef PUBLIC#using "../build/x86/Public/SlimDX.dll" as_friend#else#ifdef NDEBUG#using "../build/x86/Release/SlimDX.dll" as_friend#else#using "../build/x86/Debug/SlimDX.dll" as_friend#endif#endif#endif

So now you need to add this to a common header and make sure you include it from every source file, making compile times that much worse and leading to weird and obscure compile errors if you forget to add it to one. I cannot fathom what they thought they were doing when they designed this system, but I do know that it's sick and twisted. Wherever you are, C++/CLI designer, know that I shake my fist at your evil shenanigans.