I've got a bunch of topics to write about queued up, so check back every once in a while if you're interested.
I've got a bunch of topics to write about queued up, so check back every once in a while if you're interested.
I've found that my usage of GameDev.net has slowly but steadily decreased over the past few months. Certainly the new forum software has a small and indirect part to play in this, but the real reasons run much deeper than superficial UI annoyances. To put it succinctly, the quality of content I used to expect from GDNet has fallen sharply since the precipitous leap to the new software; I place the blame for this almost entirely on the decline of activity by other iconic members who have since moved on to greener pastures.
GDNet will always hold a special place in my heart as the forum where I cut my teeth on programming. I would not be the developer that I am today without it. It's important to make a distinction here though; the forum became the resource that it was almost entirely through its amazing user-base, one that was both deeply knowledgeable and willing to answer questions, as well as willing and able to ask good and insightful questions. Even with an overabundance of brilliant gurus, without those of the latter group asking good questions, the information doesn't get out there for others to absorb. It's my experience that the best learning comes from answers to questions that you might never have even known or thought to ask yourself. Indeed, one need only look at one of the myriad of examples of this principle in action.
You need both halves of this whole in order to wring the most wisdom you can from your collective user base. While the gurus have been slowly moving away, I think the really troubling loss is that of the "intelligent questioner". While I doubt that we have any less of them today than we did years ago, the new site design seems to invite an overwhelming amount of noise and inane questions that trample and drown the good questions before they even get off the ground. That's not to say that the loss of the gurus is any less devastating; I've made my way into the games industry now, and haven't asked direct questions in years, but even so many of the users I looked to for compelling technical content, both in the forums and in the personal journals have gone silent and missing.
Examples? ToohrVyk, Ysaneya, dgreen, and Drew Benton, some of the top rated users on the old site, all have only a handful of posts since the change over. What's even more troubling is the disappearance of moderators. Promit, Ravuya, and Oluseyi are hardly around anymore, mittens and jollyjeffers have dropped off the face of the Earth, and I know that jpetrie, moderator of For Beginners, hasn't even logged in in several months, and is currently close to achieving moderator status on the GameDev StackExchange site.
So yes, the site is losing users and the general quality level of posts has fallen. That, coupled with the decline of the journals and the bizarre and unwelcome direction the staff are taking with the site, are enough to turn me off for good. It's their site and they're welcome to do with it what they will, but it's become clear to me, and from the evidence several others as well, that it's not a direction that good for the site long term.The insistence on political correctness and "play nice" attitude being forced down our throats is particularly puzzling, especially for a community that once prided itself on having a no-nonsense, blunt, and straightforward response to any and all questions.
As I mentioned, my forum usage has already been gradually decreasing itself over the past months, changing from "several times a day" to "once or twice a week" to "meh, whenever I'm bored." I unsubscribed from the mailing list after they started using it to spam advertisements, and my GDNet+ subscription runs out some time in August. That leaves only the journals as a resource I use regularly here on GameDev.net, and now I've got my external site to take care of that as well. So what's going to change? Not much. I'll still be around from time to time to look at any SlimDX questions, and I'll probably cross-link any blog entries here, but in my mind this marks the end of GDNet as my "home" on the internet; I'm leaving and headed for a new promised land.
I've been doing a lot of work with SSE-related instructions lately, and finally got fed up with the myriad of move instructions available to load and store data to and from the XMM registers. The differences between some are so subtle and poorly documented that it can be hard to tell that there is even any difference at all, which makes choosing the right one for the job almost impossible. So I sat down and poured through the Intel instructions references and optimization manuals, as well as several supplemental sources on the internet, in order to build up some notes on the differences. I figured I might as well document them all here for everyone to use.
The name of the game with picking any instruction is performance, and you always want to choose the one that will get the job done in the least time using the least amount of space. Thus the recommendations here are geared towards these two goals. Each instruction has several bits of information associated with it that we must take into account:
- The type of the data it works with, be it integers, single precision floating point, or double precision floating point.
- The size of the data it moves. This can range from 32-bits to 128-bits.
- Whether it deals with unaligned memory or can be used with aligned memory only.
- If the move only affects a portion of a register, what happens to the remaining bits in that register after the instruction finishes.
- Any other special side-effects that the instruction may have.
Let's start off with the 128-bit moves. These move an entire XMM register's worth of data at a time, making them conceptually simpler. There are seven instructions in this category:
movapd movaps movdqa *** movupd movups movdqu *** lddquAll of these instructions move 128-bits worth of data. Breaking it down further, the first three instructions work with aligned data, whereas the next three are the unaligned versions of the first (we'll talk about the last one in a minute, since it's a bit special). The aligned versions offer better performance, but if you haven't ensured that your data is allocated on a 16-byte boundary, you'll have to use one of the unaligned instructions in order to load. When doing register-to-register (reg-reg) moves, it's best to use the aligned versions.
Each of the three instructions in each category (aligned and unaligned) operate on a different data type. Those with a 'd' suffix work on doubles; those with an 's' work on singles, and movdqa works on double quadwords (integers). This is usually a source of confusion for people, myself included, since regardless of the data type, 128-bits are still being moved, and a move shouldn't care about the raw memory it's moving. The differences here are subtle and easily overlooked, and have to do with the way the superscalar execution engine is structured internally in the microarchitecture. There are several "stacks" internally that can execute various instructions on one of several execution units. In order to better split up instructions to increase parallelism, each move instruction annotates the XMM register with an invisible flag to indicate the type of the data it holds. If you use it for something other than its intended type it will still operate as expected; however, many architectures will experience an extra cycle or two of latency due to the bypass delay of forwarding the value to the proper port.
So for the most part, you should try to use the move instruction that corresponds with the operations you are going to use on those registers. However, there is an additional complication. Loads and stores to and from memory execute on a separate port from the integer and floating point units; thus instructions that load from memory into a register or store from a register into memory will experience the same delay regardless of the data type you attach to the move. Thus in this case, movaps, movapd, and movdqa will have the same delay no matter what data you use. Since movaps (and movups) is encoded in binary form with one less byte than the other two, it makes sense to use it for all reg-mem moves, regardless of the data type.
Finally, there is the lddqu instruction which we have neglected to consider. This is a specialty instruction that handles unaligned loads for any data type, specifically designed to avoid cache-line splits. It operates by finding the closest aligned address before the one we want to load, and then loading the entire 32-byte block and indexing to get the 128-bits we addressed. This can be faster than normal unaligned loads, but doing the load in this way makes stores back to the same address much slower, so if store-to-load forwarding is expected, use one of the standard unaligned loads.
In addition to these instructions, there are four extra 128-bit moves that require mentioning:
movntdqa *** movntdq movntpd movntpsThese are the non-temporal loads and stores, so named since they hint to the processor that they are one-off in the current block of code and should not require bringing the associated memory into the cache. Thus, you should only use these when you're sure that you won't be doing more than one read or write into the given cache line. The first instruction, movntdqa, is the only non-temporal load, so it's what you have to use even when loading floating point data. The other three are data-specific stores from an XMM register into memory, one each for integers, doubles, and singles. All of these instructions only operate on aligned addresses; there are no unaligned non-temporal moves
Next we come to the moves that operate on 32 and 64-bits of data, which is less than the full size of the XMM registers. Thus this introduces a new wrinkle; namely, what happens to the remaining bits in the register during the move.
movd / movq movss / movsd *** movlps / movlpd movhps / movhpdThe first instruction in each pair listed above operates on singles (ie. 32 bits of data) and the second works on doubles, which is 64 bits of data. The first set, comprising the first four instructions, generally fill the extra bits in the XMM register with zero. The second set does not; it leaves them as they are. I'll discuss in a moment why this is not necessarily a good thing. movd moves 32 bits between memory and a register. It cannot, however, move between two XMM registers, which is an oddity that the rest of the instructions listed here do not share. movq will always zero extend during any move, including between memory and between registers. movd and movq are meant for integer data.
movss and movsd are meant for floating point data, and only perform zero extension when moving between memory and a register. When used to move between two XMM registers, they do NOT fill the remaining space with zeroes, which is confusing. movlps and movlpd generally perform the same operation, moving 32 and 64 bits of data respectively. They do not, however, perform a zero extension in any case. movhps and movhpd are slightly different from the others in that they move their data to and from the high qword of the XMM register instead of the low qword like the others. They don't do zero extension either.
Since the second set of instructions don't do zero extension, you might think that they would be slightly faster than ones that have to do the extra filling of zeroes. However, these instructions can introduce a false dependence on previous instructions, since the processor doesn't know whether you intended to use the extra data you didn't end up erasing. During out-of-order execution, this can cause stalls in the pipeline while the move instruction waits for any previous instructions that have to write to that register. If you didn't actually need this dependence, you've unnecessarily introduced a slowdown into your application.
There are several other instructions that have special side-effects during the move. Generally these are easier to see the usage, since there is only one for a given operation.
movddup - Moves 64 bits, and then duplicates it into the upper half of the register.
movdq2q - Moves an XMM register into an old legacy MMX register, which requires a transition of the x87 FP stack.
movq2dq - Same as above, except in the opposite direction.
movhlps / movlhps - Moves two 32-bit floats from high-to-low or low-to-high within two XMM registers. The other qwords are unaffected.
movsldup - Moves 2 32-bit floats from the low dwords of two XMM registers into the low dwords of a single destination XMM register, and then duplicates them into the upper dword of each half. Kind of confusing to describe, but the diagram in the documentation makes it easy to visualize if you want to use it.
movmskps / movmskpd - Moves the sign bits from the given floats or doubles into a standard integer register.
maskmovdqu - Selectively moves bytes from a register into a memory location using a specified byte mask. This is a non-temporal instruction and can be quite slow, so avoid using it when another instruction will suffice.
There are a lot of SSE move instructions, as you can see from the above. It annoys me when I don't understand something, and whenever I needed a move I would get bogged down trying to decide which was best. Hopefully these notes will help others make a more informed decision, and shed light on some of the more subtle differences that are hard to find in the documentation.
Besides various forum entries and random webpages found through judicious Googling, I took a lot of information from:
A few months back I detailed my work on an HLSL plugin for Visual Studio that would add syntax highlighting and IntelliSense support. I had to take a break from that for a while, but I started up again last week, and I've taken things in a new, more generalized direction.
Rather than work on the HLSL parser itself, I've focused my efforts on a language builder IDE, that provides tools to easily build complete front-ends for languages, including support for all of the features people have come to expect from a language's tools. I'm calling this project SlimLang for now because I'm unimaginative and the "Slim" moniker has a good reputation associated with it.
Work on the incremental parser has been difficult, due both to the complexity as well as lack of information on them out on the net, so one of the tools I really want to add to the IDE is a built-in debugger to allow stepping through the parser as it runs and see the output as a visual graph as it's being constructed. It should really help my implementation of the parser, which I can then provide as a separate component to use in conjunction with the parse tables generated by the IDE.
For the visual graph part, I found the library OpenGraph and Graph#, and for the text editor for specifying the language grammar I found AvalonEdit, both of which are WPF libraries, so I decided to take my project in that direction. I haven't been fond of WPF, but recent changes in WPF4 have definitely improved things for the better. Compare the text rendering quality between the old and new versions below:
You can also see from those images my work on a tab control style that mimics the property page tab found in Visual Studio. I didn't know much about styling when I started, so it was basically a crash course for me. The ends results are pretty good though:
I integrated the text editor to allow for specifying the grammar and wrote a quick manual parser to read it. Also got the error list hooked up nicely. All in all, it was a good few days of work.
Also some work using a DataGrid for editing terminal options:
Hopefully this should make further work on the HLSL plugin much easier, as well as provide a platform for anyone with a personal scripting language to create easy plugins for language support, which can really aid usage and adoption of a new language.
I was thinking about making these two things open source (the HLSL plugin and the SlimLang IDE), but I'm also leaning towards providing them for sale to try to get some actual income. Any thoughts on whether these two tools would be useful enough to anyone to be worth spending the money?