• entries
  • comments
  • views

About this blog

Building Hieroglyph 3, a Direct3D 11 based engine that will be available open sourced shortly...

Entries in this blog

Jason Z

After what seems like an eternity (but in reality was about 1 year) we now have quite a bit of information about the Hololens. Perhaps even more interesting is that we have a pretty good picture of what development for the device will look like. If you haven't seen it already, you can start looking on the Development Overview page for a good introduction.

Right off the bat, it was great to see that Microsoft is supporting both a Unity based development model and a more traditional development model where you can work in C++ or any of the other usual suspects for UWP applications (JavaScript, C#, or VB). I like playing around in Unity, but after being away from C++ for a while I start to get a little anxious... Anyways, there is lots of documentation there and some interesting tidbits about how applications will be interacting with the Hololens itself.

I'm also very happy to see some of the Academy content, which is essentially step by step tutorials to get you started. Having this type of content prior to the device being released is a nice change of pace from any of the big tech companies, and I hope this is a trend that continues as more and more VR and AR headsets are released.

While I am fortunate enough to be in the first wave of headsets coming out, it is also an excellent idea to have an emulator be a first class citizen. $3000 is a whole bunch of money, and the devices will be limited in supply initially, so providing a way for people to work on their programs without a device is a great thing that again I hope catches on in the industry. From the looks of it, the emulator should be pretty capable so I'm looking forward to giving it a spin myself when it is released.

Our next checkpoint is going to be at the BUILD conference at the end of March, where I would expect the bits to finally be released and of course more details to be made available. Microsoft has been killing it with their Hololens announcements, so hopefully this one goes out with a bang to the developers!

So how about you guys - is anyone out there planning to build some AR programs with the Hololens? Do you have any opinions on the documentation after reading through it? This is just the beginning, so be sure to provide as much feedback as you can to the Hololens team as it will only help improve the final product in the end.

I am planning to integrate Hololens with Hieroglyph 3, so it should be fun to build up the framework around the Holographic APIs. I'll be sure to continue posting here about my progress, so stay tuned!

Jason Z

Its always a good practice to think about what you have done over the past year, and of course to think about what is coming in the next year. I have found this to be a good way to gain perspective about my various projects and activities, so I will indulge in some thinking out loud here.


This year I was extremely busy at work. Normally this would be to the detriment of my passion outside of work (3D software development) but in fact I have been incorporating my 3D work into the daily job. There is a program at work that allows us to pitch ideas for new products and services, and I have had some success in pitching concepts related to 3D programming. So while I have been relatively quiet on GameDev.net, I have in fact been working harder than ever on my craft of choice.

With that said, I have had the opportunity to use Hieroglyph 3 in some of these projects. It is interesting to go from a single developer working on the rendering framework to a larger organization having to use it as well - there are lots of things that I take for granted that have to be explained to other developers, and this provides some insight into what can be improved. There are a number of areas that I will indeed be iterating over, including the following:

  1. NuGet Support: After over a year working with NuGet, I think it is time to remove its use from Hieroglyph. It reduces the source needed for building the library, but it also causes a delay in updating to new compiler versions due to the need for the NuGet package owner to update. I would like to find a better solution here, possibly with something like a source code version of NuGet.
  2. Static Library Management: Over the years I have added some dependencies, some of which are integrated (Lua, DirectXTK) and others that are optional (MFC, WPF, OculusSDK, Kinect SDK, etc...). I'm not currently happy with the way these libraries are managed, and there are some cases where static libraries are handled differently depending on the source. I would like to improve optional library support, preferably also improving the project support system in Visual Studio as well.
  3. Moving to Git: I have used SVN for Hieroglyph from the start, but it is time to move to Git - I think there isn't much more to be said about that one...
  4. Scene Management: My original scene management system was loosely based on the Dave Eberly style scene graph, and it has evolved from there. However, with modern C++ coming of age, there are a number of new topics to explore. For example, the Actor system is inheritance based, but it could just as easily use static interfaces with templates. Object ownership is also an interesting area that I previously baked into a usage pattern, but it could be more explicitly defined as well.
  5. Less Framework for more Flexible Framework: There are many times when I wished it was easier to use pieces of Hieroglyph without needing the whole thing. I would also like to move more in a less connected usage pattern whenever possible, to make each piece more standalone.

Overall I am really happy with the experiences I have gained over the past year, and I really look forward to applying some of those experiences to the points above in 2016...


There are a lot of exciting things that will be happening in 2016 for the world of 3D programming. First and foremost is the arrival of the head mounted displays that we have been building up to over the past couple of years. On the VR side I have been working mostly with the Oculus developer kits, but some colleagues are also using the HTC Vive. Whichever side you choose to work on (or both) you will be happy with the arrival of the final headsets.

The one that I am most interested in though, is the Hololens. Even with the limited field of view, I see the potential use cases for Hololens as significantly more broad than with closed off VR headsets. Of course, nothing is really known about the development support, so there is lots to be seen if they get it right or not. In any case, I can't wait to get a developer kit and see what can be done with it!

There are other things that I would like to explore related to AR and VR on the engine side as well. In the past, you always built your engine to produce a 2D output image. This is technically still true for the new HMDs (albeit with two output 2D images) but they make you think about designing more for 3D content. There has been lots of discussion about managing GUIs in 3D, but I think there are still lots of areas to investigate where we can take advantage of keeping our content in 3D deeper into the pipeline.

There are lots more areas that I'll be exploring over the coming year, but I'll just have to write about them when we get there. I think it is going to be a great year, so happy new year and lets take advantage of the time we have to make some great stuff!

Jason Z
Way back when D3D9 was first released, graphics debugging and performance was a complete black art. It was really, really difficult to know what was going on, and when the graphical output wasn't quite what it was supposed to be then you really had to put on your detective hat and begin to deduce what the issue could be. It was even worse for performance topics, where you need to figure out where to spend some significant time optimizing or changing something around to speed things up a bit.

I might make myself sound old, but nowadays it is ridiculously easy to track down both correctness errors, as well as most performance issues. I recently finished up an application at work that is used for marketing, and the performance was pretty good but not great. I suspected that I was somehow GPU bound, since the framerate dropped more or less proportionally with render target resolution, but I wasn't sure exactly what was the issue. The rendering technique goes something like this:

  1. Render the scene into a 'silhouette' buffer, used for identifying objects to be highlighted (only geometry that will be outlined gets rendered here)
  2. Render the scene into the normal render target (whole scene is rendered here).
  3. Render a full screen quad that samples the silhouette buffer and applies the highlights.

There is a normal desktop mode, and it also has an Oculus Rift mode as well. Of course, the Rift mode decreases performance, since you render the whole sequence twice (once for each eye). I decided I wanted to throw the Visual Studio 2013 performance and diagnostic tools a chance to diagnose what was going on, and I was totally surprised to see that there was a single draw call that was taking substantially more time than the others. If you aren't familiar with the performance tools, you just have to go to "Debug > Performance and Diagnostics" and then check CPU and GPU, and your off to the races.

Due to the multi-pass technique that I was using, I figured it would be either a bandwidth issue, or possibly even a pixel shader limitation - but I totally didn't think it would be primarily from a single draw call... So I then fired up the graphics debugger and took a frame capture, and looked into a similar place in the frame to see if I could trace back which draw call it was. Next I clicked on the "Frame Analysis" tab, where I then found the following graph:


I clicked on the obvious item and tracked it back to a simple OBJ based model, that was rendering a super high resolution mesh instead of the version intended for use in the application. So instead of going down the rabbit hole to figure out if my two pass rendering was the issue, I trivially replaced the model and solved the issue.

So the moral of the story is this: make use of the modern debugging tools that are available in VS2013 and the forthcoming VS2015. They will help you understand your CPU and GPU utilization, find issues, and let you focus on the important, most-bang-for-the-buck tasks!
Jason Z
I have a small confession to make: I have very rarely used STL algorithms, even though I know I should. There is lots of really great reasons why I should, there top C++ developers out there that tell you to use them whenever it applies, but I just don't feel 100% comfortable with them yet. With that in mind, we can explore a recent foray into the use of an algorithm to both make things easier and harder at the same time :)

The Setup

I recently wrote some code that would access a vector of structures, where each structure is sorted according to one of its data members. The data member itself happened to be a float. There is nothing too fancy or special about the data structure at all. One particular use of the data stored in this vector is to find the two elements that surround a provided input value. In this case, the code is being used to perform an interpolation between those two structures, based on the location of the input value with respect to the two enclosing structures.

That sounds easy enough, but in fact it can lead to a morass of code littered with if statements, a for loop, and a few other things that are kind of 'old-school C++'. Since I was accessing two elements at a time, I was stuck with a for loop (or something equivalent) and because you have to handle the cases where the input is outside of the two ends of the vector, there must be some branching involved too. By the time it was up and running, I went back to try and comment the code and ensure that when I came back 3 months from now that I could understand it. I was having a bit of trouble writing a good set of comments, which I thought was a bad sign...

The Swing

It was then that I decided I would give it a shot to try and find an STL algorithm that could get me pretty close to the same result but hopefully with less code to maintain. In case you aren't familiar, I am talking about the contents of the algorithm header, which supplies something like 40 prebuilt functions for some of the most common operations in computing with the STL data structures.

The less code you have to write, the better off you are in the long run. In addition, algorithms have (supposedly) well defined functionality which means that your code should be more expressive to boot. The trick is to know which algorithm will do what you want, and then to understand the semantics of the algorithms to make sure you are using it correctly. This latter part is where I fell down this time...

In case you haven't already guessed it, the algorithm that can be used for the searching task I mentioned above is the std::lower_bound and std::upper_bound functions. If you haven't ever used them, I would recommend writing out a few sample programs to play around with them - they are incredibly easy to use, and you don't need to fuss about the implementation. It just does its thing and returns the result.

So I modified my function to use lower_bound, and then compare the result with the begin and end of the vector. The code was about 1/4 the size, very easy to understand, and easily described with comments. I thought it was a grand testament to the algorithm header - the stories were true, and I should be using algorithms whenever I can!

The Miss

The problem is that when I was more closely scrutinizing the results of the function calls, it turned out that I was always getting an iterator to the element above the input value. That didn't seem to make any sense, since a lower_bound function should be returning the item that is just lower than my input! But no, that isn't how it works...

Let's assume a simple example of a vector of floats, that is populated like this:std::vector values = {1.0f, 2.0f, 3.0f};
Now if you call lower bound and pass in a value of 1.5f, then I would expect to get an iterator to the first element. It turns out that you will always get the second element as a result. The reason is that the iterator that is returned is actually designed to be the iterator where you would insert the new value if you were going to insert it. That's right - it has nothing to do with the lower bound, but rather it is the insertion location.

I thought to myself "That's strange... I wonder what upper_bound does then...". It actually returns an iterator to the item after your input value - which is what I would expect it to do. It turns out that the name lower_bound is not entirely in agreement with what most of us would call a lower bound. The only real difference between these two functions is when you are requesting a value that is already in the vector (or if there are multiples of that value in the vector). It is a very subtle difference, and maddeningly difficult to reason about when you are an hour or two into refactoring a relatively minor function.

The End

So the moral of the story is that algorithms are good, and they can help you. If given the choice, I would take the algorithm version of the function any day. But you take responsibility for fully understanding what the algorithm does - and you will pay the consequences if you jump in without fully understanding what the algorithm does!

If you want to try it out for yourself, here is the simple starter code that I used to play around with it:#include "stdafx.h"#include #include #include int _tmain(int argc, _TCHAR* argv[]){ std::vector values{ 1.0f, 2.0f, 2.0f, 3.0f }; auto lower_bound_result = std::lower_bound(values.begin(), values.end(), 2.0f); auto upper_bound_result = std::upper_bound(values.begin(), values.end(), 2.0f); std::wcout << L"Lower Bound: " << lower_bound_result - values.begin() << std::endl; std::wcout << L"Upper Bound: " << upper_bound_result - values.begin() << std::endl; return 0;}
Jason Z

Microsoft's Hololens

In general, I have always been interested in computer graphics. There is lots of different problems to solve, and if you like math/geometry then there really aren't many better ways to exercise your brain than working in this area. Once you know how to take geometry and project it onto an image, it can also be quite satisfying to do the opposite - start to investigate computer vision and understand how to take an image, and convert it back into a set of objects. A while back, I integrated Kinect support into Hieroglyph 3 for just such a reason - you can do some really interesting stuff with a color and a depth map of a scene, and even cooler stuff if you take that image sequence over time.

A natural extension of generating images for a monitor is to generate them for something like the Oculus Rift. This is actually not much different than regular computer graphics, except that you get the position and orientation of your camera from the Head Mounted Display (HMD), and then render the scene twice for two eyes. I also explored this by adding Oculus Rift support into Hieroglyph 3. It is really cool to play around with the technology and see how it works. Everyone talks about 'presence' with VR, and it really is a bit spooky how much you can get tricked into feeling like you are there instead of here.

However, in the end VR stuff basically takes you from a 2D monitor window into a virtual world, and wraps that virtual world all around you. The problem is, there is no in between - you can't see anything of the real world when you have the HMD on. This presents all kinds of problems, including the question about how do you interact with a virtual world if you can't see your own hands! There are lots of extremely smart people working on this very problem, many of them at Oculus I'm sure. But there is a solution to this problem already coming down the road, that might just change the very nature of how we approach interfacing to the devices all around us: The Hololens.


If you think about it, the Hololens combines both computer graphics and computer vision. You get to put computer generated objects into the real world using computer vision. Of course, we don't know yet how much access we will get to the underlying technology (although that answer is apparently at least partially coming at Build according to a few interviews) but it is really cool to think you can interact with the basic structure of the world around you - at the same time you are wearing the HMD.

That totally solves the challenge of how to interact with the world around you with the HMD on. I don't know how well it is implemented, or how low the latency is on the headset, but if it works as everyone seems to be reporting, then I can't wait to get my hands on one of the development kits for the Hololens. Consider the example Minecraft demo that Microsoft showed off:


Think of all the cool things you could do with access to the room's basic structure, and selecting overwriting of its contents. Have you ever played Portal? The opportunities are limitless...

What do you guys think - what will you do with this technology!?!?!
Jason Z
Last time around, I described current data layout of my primary scene graph class - the Entity3D. After some thought and further research into what direction I want to take the engine, I decided to make a few changes. Let's start with the printout of the current layout - after my recent changes:[code=:0]2> class Entity3D size(288):2> +---2> 0 | m_pParent2> 4 | ?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@ m_Name2> 28 | Transform3D Transform2> 216 | ?$ControllerPack@VEntity3D@Glyph3@@ Controllers2> 232 | Renderable Visual2> 252 | ParameterContainer Parameters2> 268 | CompositeShape Shape2> 284 | m_pUserData2> +---
As compared to before, there is a huge reduction in size - from 396 bytes down to 288. Some of this was due to discovering some of my own incompetence (there was an extra unused matrix object in one of the objects that composed the Entity3D) and some due to actual design changes. I suppose this is a good advertisement for checking the resulting layout of your objects - to find extra matrices that don't need to be there!

The design changes show a general refactoring of the objects contents into separate classes. All of the scale, rotation, and position (and their matrices) have been refactored into a Transform3D object. The rendering related objects are now part of the Renderable class. The respective member functions have also been moved accordingly. This type of refactoring helps to consolidate the content, and it also makes it easier to include this same functionality in another class by simply including those new classes.

That leads to the other big change that I have made. Previously the Node3D class was a sub-class of Entity3D, which made both classes virtual (and hence used a vtable). This is less than optimal since it messes with the cache, it makes every single Entity3D and Node3D bigger than it really needs to be (by one pointer) and it doesn't really buy you very much in either functionality or savings. So I decided to split the inheritance hierarchy, and just make Entity3D and Node3D their own standalone classes.

The refactored objects I described above made it pretty easy to build a Node3D without inheriting its functionality. The only real hiccup was that the controller system had to be made template based, but that wasn't really an issue. Overall it was a pretty easy transition, and now the Node3D has a better defined purpose - to provide the links between objects in the scene graph. I'm pretty happy with the change so far...

In the end, there is some objects which were simply removed from the scene graph objects. This is primarily the bounding spheres, which will be relocated into the composite shapes object. That work is still under way, so I'm sure I'll write more about it once it is ready!
Jason Z

CppCon 2014

In case you haven't heard about, all of the sessions from CppCon 2014 are being released on the CppCon YouTube Channel. This is a fantastic way for you to catch up on the latest and greatest things that people are doing with C++ (especially modern C++), and the price is about as good as it gets. There are over 100 sessions, and they are being posted as they are processed, so be sure to check back periodically.

This post is going to be related to one of the keynote talks by Mike Acton titled "Data Oriented Design and C++". The talk itself was pretty good, with the delivery only suffering from slightly jumping from topic to topic (he is also more of a C fan than C++, but nobody is perfect smile.png ). However, the concept and message that Mr. Acton was delivering came through loud and clear - you should be paying close attention to your data and how it is used. His examples showed his clear understanding and (what I consider to be) unique way of diagnosing what is going on. One of his examples was about the scene graph node class in Ogre3D, which he subsequently cut into and showed how he would change things.

Hieroglyph 3 Scene Graph

That got me thinking about the scene graph in Hieroglyph 3. This system is probably one of the oldest designs in the whole library, and has its origins in the Wild Magic library by Dave Eberly. The basic idea is to have a spatial class to represent your 3D objects (called Entity3D), and a node class which is derived from the spatial class (called Node3D) but which adds the scene graph connectivity. This system has worked for me for ages, and is a simple, clear design that is easy to reason about (at least to me it is...).

Prompted by the talk I mentioned above, I wanted to take a closer look at the data contained within my scene graph classes, and try to see if I am wasting memory and/or performance in an obvious way. I never tried to "optimize" this system, since it never showed up as a significant bottleneck before, but I think it will serve as a nice experiment - and it will let me explore some of the techniques for drilling into this type of information.

I am going to spread this analysis over a few blog posts, mostly so that I can provide the appropriate amount of coverage to each particular topic that I touch on. For this post, we will simply start out by identifying one easy to use tool in Visual Studio - the unsupported memory layout compiler flags. There have been many instances where Microsoft has mentioned two different memory layout flags that will dump layout info about an object to the output window in VS: /d1reportSingleClassLayoutCLASSNAME and /d1reportAllClassLayout. They do what it sounds like - basically just report the size of one or all classes in the translation unit being compiled. You can add this to the 'Command Line' entry in the property file of the CPP file containing the desired class, and then compile only that translation unit by selecting it and pressing CTRL+F7. Be sure to be consistent in how you test as well - using release mode is probably the most sensible, and stick to either 32- or 64-bit compilation.

The output can be a bit verbose, but within it you will find a definition for your class along with a byte offset for each member. An example is shown here:1> class Entity3D size(396):1> +---1> 0 | {vfptr}1> 4 | Vector3f m_vTranslation1> 16 | Matrix3f m_mRotation1> 52 | Vector3f m_vScale1> 64 | Matrix4f m_mWorld1> 128 | Matrix4f m_mLocal1> 192 | m_bPickable1> 193 | m_bHidden1> 194 | m_bCalcLocal1> | (size=1)1> 196 | m_pParent1> 200 | ?$vector@PAVIController@Glyph3@@V?$allocator@PAVIController@Glyph3@@@std@@ m_Controllers1> 212 | ?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@ m_Name1> 236 | EntityRenderParams m_sParams1> 320 | ParameterContainer Parameters1> 336 | Sphere3f m_ModelBoundingSphere1> 356 | Sphere3f m_WorldBoundingSphere1> 376 | CompositeShape CompositeShape1> 392 | m_pUserData1> +---
There are a few things you can see right away without trying too hard:

  1. The object is 396 bytes big
  2. The class is part of an inheritance hierarchy, since it has a vfptr
  3. There is some wasted space in the middle of the object due to alignment requirements

After seeing this, I started to dig in to what is really needed from these classes, and then started to identify a few different strategies for simplifying and reducing their sizes. More on those results in the next post or two!

One more thing...

On October 1st, I was re-awarded as a Visual C++ MVP! That marks six years running (yay!). Whenever I hit a milestone like this, I always like to reflect on how I got to that point - and GameDev.net is always a strong contributor to where I am today. I truly learned a ton from the people in this community, and I get a great deal of satisfaction from trying to contribute back - so thank you all for being part of something great!
Jason Z

Managing Dependencies

Hieroglyph 3 has two primary dependencies - Lua and DirectXTK. Lua is used to provide simple scripting support, while DirectXTK is used for loading textures. Both of these libraries are included in the Hieroglyph 3 repository in source form. This allows for easy building with different build configurations and options, but it also comes with a number of costs as well.

First of all, you have to manually update your own repository whenever your dependencies make changes - or risk falling behind with the latest and greatest changes. In addition, since there are lots of source files in each of these dependencies, it bulks up the repository which makes cloning slower and increases the size of the repository overall.

Another big down side is that when you rebuild the entire solution, you have to rebuild all of the dependencies as well. This is sometimes a good thing (as mentioned above about the various build options) but in general it just adds time to the build process. Since the dependencies don't really change very often, then doing a full rebuild is needlessly longer than it should be.

Managing Dependencies with NuGet

With the most recent commit of Hieroglyph 3, I have replaced the DirectXTK source distribution with a reference to a NuGet package. If you aren't familiar with NuGet, it is basically a package manager that you can use to automatically download and install pre-built dependencies. This is actually old news for .net developers, who have had access to NuGet for quite some time. However, for native C++ developers, this is a relatively new facility for managing the type of dependencies discussed above.

The package manager console is built right into Visual Studio, making it easy to count on your users having access to it. Overall, I spent about 10 minutes trying things out, and with a single 'Install-Package directxtk' command, I was in business.

So now, I have a single XML file that references direcxtk, and when you build, the needed library and include files are automatically downloaded if they haven't already been. This actually solves most of the issues mentioned above, without bulking up the repository with large lib files. I'm trying this out with the DirectXTK first, and if it works out well then I will also update the Lua dependency as well.

In fact, if it works as well as advertised, I may even build a NuGet package out of Hieroglyph 3 for simple installation and use of the library...
Jason Z

Building Hieroglyph 3

Hieroglyph 3 always has had an 'SDK' folder where the engine static library is built to in its various configuration and platform incarnations, and the include (*.h, *.inl) files are copied to an include folder. This lets a user of the engine have an easy way to build the engine and grab the result for use in another project that doesn't want to have a source copy of Hieroglyph 3 in the project. You can put the various different versions of the static library output into different folders using some of the built in macros to modify the build path. For example, I use the following setting for my Output Directory project property:


The sample applications included with the engine link against this SDK folder accordingly, and it works well in most situations. There are occasional issues when Visual Studio will open a copied header file from the SDK folder instead of the original source version, which leads to strange phantom bugs where edits that you made earlier disappear, but that is manageable with some diligence.

MSBuilding Hieroglyph 3

However, when trying to clean all configurations, or to build all configurations, doing so from the IDE is no fun - especially if you are building large projects that take some time. So I recently dove into using MSBuild from the command line and wrote a couple of batch files to build them all automatically for me. For example, here is the sdk_build.bat file:

[indent=1]msbuild Hieroglyph3_Desktop_SDK.sln /t:Rebuild /p:Configuration=Debug /m:4 /p:Platform=Win32
msbuild Hieroglyph3_Desktop_SDK.sln /t:Rebuild /p:Configuration=Release /m:4 /p:Platform=Win32
msbuild Hieroglyph3_Desktop_SDK.sln /t:Rebuild /p:Configuration=Debug /m:4 /p:Platform=x64
msbuild Hieroglyph3_Desktop_SDK.sln /t:Rebuild /p:Configuration=Release /m:4 /p:Platform=x64

This let's you fire and forget about the building process, and it allows for automatically generating the output of your project. There is a corresponding version for cleaning as well. This is my first time using msbuild from the command line (it is the same build system that the IDE uses) and I am quite happy with how easy it is to work with. One slightly alternate motive for experimenting with this is to eventually incorporate a continuous integration server into the project, which would also need some script driven build setups.

Dependency Linking

One other recent change that I made to the project settings is to set the 'Link Library Dependencies' to true for all of my static libraries that consume other static libraries. In the past, I always defaulted to making the end application collect and link to all static libraries used throughout the entire build chain. That started to get old really quick once I started incorporating more and more extension libraries. For example, I have Kinect and Oculus Rift extensions which have their own static libraries. Then the Hieroglyph 3 project has dependencies on Lua and DirectXTK which have their own static libs.

By using the 'Link Library Dependencies' I no longer have to pass the libs down the build chain - each library can link in whatever it is using, and the applications have a significantly simpler burden to get up and running. Simpler is better in my book!

Source Code Management

One other final note about source code management. Hieroglyph 3 has used SVN as its SCM for a long time. Times have changed, and open source development has come a long way since I started out on the project. I will be migrating the repository on Codeplex over to Git, which I think will make it much easier to accept contributions as well as to utilize modern open source tooling for history and tracking purposes. I use Git at work, and I really like the decentralized nature of it. It is time to move on...

Miscellaneous Stuff

I have also been playing around a little with the Visual Studio 2014 CTP, and some of the new C++ features that it brings with it. There is some good stuff in there (see here for some details) so check it out and see what you can do with them!

Also, it was recently announced that the CppCon sessions will be professionally video recorded. CppCon is going to be a big fat C++ fest with lots of great talks scheduled (6 tracks worth!), so if you haven't already registered, go do it now! The program and abstracts are available now, so take a look and see if it would be good for you to check it out!
Jason Z

Simple Mesh Loaders

Over the years, I have relied on the trusty old Milkshape3D file format for getting my 3D meshes into my engine. When I first started out in 3D programming, I didn't have a lot of cash to pick up one of the heavy duty modeling tools, so I shelled out the $20 for Milkshape and used that for most of my model loading needs. It came with a simple SDK that I used to understand the format, and then I wrote a file loader for it which worked just fine (despite my lack of experience writing such things...).

Later on, a PLY loader was written by Jack Hoxley (jollyjeffers for those of you who have been around here a while) while we were working on Practical Rendering and Computation with Direct3D 11. Other than these two formats, all other geometry loaded into the engine was procedurally created or just brute force created in code. I had been thinking of integrating AssImp for quite a while, and finally sat down to try it out.

While I have lots of respect for the authors of AssImp, and I think it is a great project that meets a big need, I decided not to incorporate it into Hieroglyph 3. In general, I don't like adding dependencies to the library unless they are absolutely needed. AssImp seemed potentially worth the hassle, so I spent a day or two reading its docs and trying to get a feel for how the API worked and what I would need to do to get it up and running. By the time I was done, I felt relatively confident that I could get something up and running quickly.

So I tried to integrate the building of AssImp into my solution and add it as one of the primary projects in the build chain. I messed with the project files for about 45 minutes, and finally decided that it wasn't meant to be - if I can't add a project into a solution seamlessly in the first few tries, then something isn't working. Either their build system is different, or I'm not understanding something, or whatever - I just didn't want to add a bunch of complexity to the engine just to add more file format capability.

Instead, I decided I would simply write some basic file loaders for the formats that I wanted to work with. To start out with, I implemented the STL loader, which was actually exceedingly easy to do. In fact, here is the complete code for the loader://--------------------------------------------------------------------------------// This file is a portion of the Hieroglyph 3 Rendering Engine. It is distributed// under the MIT License, available in the root of this distribution and // at the following URL://// http://www.opensource.org/licenses/mit-license.php//// Copyright (c) Jason Zink //--------------------------------------------------------------------------------// This is a simple loader for STL binary files. The usage concept is that the// face data gets loaded into a vector, and the application can then use the face// data as it sees fit. This simplifies the loading of the files, while not // making decisions for the developer about how to use the data.//// Our face representation eliminates the unused AttributeByteCount to allow each// face to align to 4 byte boundaries. More information about the STL file format // can be found on the wikipedia page:// http://en.wikipedia.org/wiki/STL_%28file_format%29.//--------------------------------------------------------------------------------#ifndef MeshSTL_h#define MeshSTL_h//--------------------------------------------------------------------------------#include #include #include "Vector3f.h"//--------------------------------------------------------------------------------namespace Glyph3 { namespace STL {templatevoid read( std::ifstream& s, T& item ){ s.read( reinterpret_cast(&item), sizeof(item) );}//--------------------------------------------------------------------------------class MeshSTL{public: MeshSTL( const std::wstring& filename ) : faces() { unsigned int faceCount = 0; // Open the file for input, in binary mode, and put the marker at the end. // This let's us grab the file size by reading the 'get' marker location. // If the file doesn't open, simply return without loading. std::ifstream stlFile( filename, std::ios::in | std::ios::ate | std::ios::binary ); if ( !stlFile.is_open() ) { return; } unsigned int fileSize = static_cast( stlFile.tellg() ); // Skip the header of the STL file, and read in the number of faces. We // then ensure that the file is actually large enough to handle that many // faces before we proceed. stlFile.seekg( 80 ); read( stlFile, faceCount ); if ( fileSize < 84 + faceCount * FILE_FACE_SIZE ) { return; } // Now we read the face data in, and add it to our vector of faces. We // provided an ifstream constructor for our face to allow constructing // the vector elements in place. Before starting the loop, we reserve // enough space in the vector to ensure we don't need to reallocate while // loading (and skip all of the unneeded copying...). faces.reserve( faceCount ); for ( unsigned int i = 0; i < faceCount; ++i ) { faces.emplace_back( stlFile ); } }public: struct Face { Face( std::ifstream& s ) { read( s, normal ); // Read normal vector read( s, v0 ); // Read vertex 0 read( s, v1 ); // Read vertex 1 read( s, v2 ); // Read vertex 2 s.seekg( 2, std::ios_base::cur ); // Skip 2 bytes for unused data } Vector3f normal; Vector3f v0; Vector3f v1; Vector3f v2; }; static const unsigned int FILE_FACE_SIZE = sizeof(Vector3f)*4 + sizeof(unsigned short); std::vector faces;};} }//--------------------------------------------------------------------------------#endif // MeshSTL_h//--------------------------------------------------------------------------------
That's the whole thing - in a single header file. The general idea is to load the file contents into memory, and then let the developer decide how to use that data to generate the needed vertices and indices. I don't necessarily know exactly what vertex layout and all that in advance, so having flexibility is pretty important in Hieroglyph 3. Once I wrote this (which I would be happy to get criticism on by the way!) I decided that I could also write an OBJ loader, along with the corresponding MTL file loader to go with it. I am quite honestly so happy that I went this path instead of using another third party library - now I just need to add a single header file, and I have access to a new format.
Jason Z
I recently have been adding support to Hieroglyph 3 for the Oculus Rift. This post is going to discuss the process a little bit, and how the design of the Hieroglyph 3 engine ended up providing a hassle free option for adding Rift interaction to an application. Here's the first screen shot of the properly running output rendering:


I have been an admirer of the Rift for quite some time, and I wanted to find a way to integrate it into some of the sample applications in Hieroglyph. I'll assume most of you are already familiar with the device itself, but when you think about how to integrate it into an engine you are looking at two different aspects: 1) Input from the HMD's sensors, and 2) Output to the HMD's screen. If your engine is modular, it shouldn't be too hard to add a few new options for a camera object and an rendering pass object.

After working on the engine for many years, I was completely not interested in building and maintaining multiple copies of my sample applications just to support a different camera and rendering model. I work pretty much on my own on the engine, and my free time seems to be vanishingly small nowadays, so it is critical to get a solution that would allow for either a runtime decision about standard or HMD rendering, or a compile time decision using a few definitions to conditionally choose the HMD. I'm pretty close to that point, and have a single new application (OculusRiftSample) set up for testing and integration.

Representing the HMD

The first step in getting Rift support was to build a few classes to represent the HMD itself. I am working with the OculusSDK 0.3.2 right now, which provides a C-API for interacting with the device. I basically created one class (RiftManager) that provides very simple RAII style initialization and uninitialization of the API itself, and then one class that would represent the overall HMD (RiftHMD).

RiftHMD is where most of the magic happens with the creation of an HMD object, lifetime management, and data acquisition and conversion to the Hieroglyph objects. The OculusSDK provides its own math types, so a few small conversion and helper functions to get the sensor orientation and some field of view values was necessary. You can check out the class here (header, source).

Getting the Input

Once you have a way to initialize the device and grab some sensor data, the first job is to apply that to an object in your scene that will represent the camera movement of your user. In Hieroglyph 3 this is a accomplished with an IController implementation, called RiftController (header, source). This simple class takes a shared_ptr to a RiftHMD instance, and then reads the orientation and writes it to the entity that it is attached to.

All actors in Hieroglyph are composed of a Node3D and an Entity3D. The node is the core of the object, and the entity is attached to it. This allows for easily composing both local (via the entity) and absolute (via the node) motion of an object. For our Rift based camera, we attach the RiftController to the entity of the Camera actor. This lets you move around with the normal WASD controls, but also look around with the Rift too.

Rendering the Output

Rendering is also quite interesting for the Rift. The DK1 device has a 1280x800 display, but you don't actually render to that object. Instead, you render to off-screen textures (at much higher resolutions) and then the SDK uses these textures as input to a final rendering pass that applies the distortion to your rendered images and maps that to the HMD's display. All of this stuff is nicely encapsulated into a specialized SceneRenderTask object called ViewRift (header, source).

This object creates the needed textures, sets up the state objects and viewports needed for rendering, and also supplies the actual rendering sequence needed for each eye. This construct where a rendering pass is encapsulated into an object has been one of the oldest and best design choices that I have ever made. I can't emphasize it enough - make your rendering code component based and you will be much happier in the long run! All rendering in Hieroglyph is done in these SceneRenderTask objects, which are just bound to the camera during initialization.

The Final Integration

So in the end, integration into an application follows these easy steps:

1. Create a RiftManager and RiftHMD instance.
2. Create the application's window according to the RiftHMD's resolution.
3. Create a RiftController and attach it to the Camera's entity.
4. Create a ViewRift object and bind it to the camera for rendering.
5. Put on the headset and look around :)

It is simple enough to meet my requirements of easy addition to existing samples. I still need to automate the process, but it is ready to go. Now I want to experiment with the device and see what types of new samples I could build that take advantage of the stereo vision capabilities. The device really is as cool as everyone says it is, so go out and give it a shot!
Jason Z
[font=arial]I recently picked up a copy of the book "Developing Microsoft Media Foundation Applications" by Anton Polinger. I have been interested in adding some video capture and playback for my rendering framework, and finally got a chance to get started on the book.

What I immediately found interesting was in the foreword on page xvi, the author describes a coding practice that he uses throughout the examples. The idea is to use a 'do {} while(false)' pattern, and he puts all of his operations into the brackets of the do statement. The loop will execute precisely once, and he wraps all of those operations with a macro that 'break' on an error. This effectively jumps to the end of the block when an error occurs, without requiring verbose return codes or exception handling.

[color=rgb(0,0,0)][background=transparent]I haven't ever seen this type of flow control, so I was wondering what your thoughts on this are. I would assume that the compiler would optimize out the loop altogether (eliminating the loop, but still executing the block) due to the always false condition, but the macros for checking failure would still end up jumping properly to the end of the block. It seems like a relatively efficient pattern for executing a number of heavy duty operations, while still retaining pretty good readability.[/background][/color]

[color=rgb(0,0,0)][background=transparent]I have asked around, and this seems to be at least a known practice. If there are any interesting use cases where this makes especially good sense, I would love to hear about them![/background][/color]

Jason Z
At the BUILD 2014 conference, Max McMullen provided an overview of some of the changes coming in Direct3D 12. In case you missed it, take a look at it here. In general, I really enjoy checking out the API changes that are made with each iteration of D3D, so I wanted to take a short break from my WPF activities to consider how the new (preliminary) designs might impact my own projects.

Less Is More
When you take a look at the overall changes that are being discussed, you end up with less overhead but more responsibility - so less is more really does apply in this case. Most or all of the changes are designed to simplify the work of the driver and runtime at the expense of the application having to ensure that resources remain coherent while they are being used by the pipeline. This type of trade off can be a double edged sword, since it can require more work on your side to ensure that your program is correct. However, there have been a number of hints about significant tooling support - so I am initially encouraged that this is being considered by the D3D team.

My initial feeling when I saw all of these changes is that they are in fact quite reasonable. When I consider how I would modify Hieroglyph 3 to accommodate these changes, I don't see a major tear-up. Each changes seems fairly well contained on the API side, and Max provided a pretty good rationale for why each change was needed. Here are the major areas of changes that I noted, and some comments on how they fit with Hieroglyph 3.

Pipeline State Objects
The jump from D3D9 to D3D10/11 essentially saw a grouping of the various pipeline states into immutable objects. The concept was to reduce the number of times that you have to touch the API in order to set a pipeline up for a draw call. It sounds like D3D12 will take this all the way, and make most of the non-resource pipeline state into a single big state object - the Pipeline State Object (or PSO as it is sure to be referred to as...). This seems like an evolutionary step to me, continuing what was already started in the transition to D3D10.

Hieroglyph 3 already emulates this concept with the RenderEffectDX11 class, which encapsulates the pipeline state to be set when drawing a particular object. Each object can have its own state, and replacing this with a PSO will be fairly simple. Most likely the PSO can be created centrally in a cache of PSOs, and just handed out to whichever RenderEffectDX11 instance that happens to match the same state. If none match, then we create a new entry in the PSO cache. Since the states are immutable, we don't have to worry about modifications, and the runtime objects lifetimes can be managed centrally in the cache. If this makes the system faster, I'm all for it!

Resource Hazard Management
Instead of the runtime actively watching for your resources to be bound either as an input or an output (but not allow both simultaneously), Direct3D 12 will instead use an explicit resource barrier for you to indicate when a resource is transitioning from one to the other. I have actually run into problems with the way that Direct3D 11 handles this hazard management before, so this is a welcome change.

For example, in the MirrorMirror sample I do a multiple pass rendering sequence where you generate an environment map for each reflective object, followed by the final rendering pass where the reflective objects use the environment maps as shader resources. When you go to do the final rendering pass, you either have to set the output merger state or the pipeline state first. If you bind the pipeline state first, then the environment map gets bound to the pixel shader with a shader resource view. However, from the previous pass the environment map is still bound to the output merger with a render target view - so the runtime unbinds the oldest setting and issues a warning. If you set the states in the opposite order, then you get the same situation on the next frame when you try to bind the render target view for output.

This essentially forces you to either ignore the warning (and just take whatever performance hit it gives you) or you have to explicitly clear one of the states before configuring for the next rendering pass. Neither of these ever seemed like a good option - but in D3D12 I will have the ability to explicitly tell the runtime what I am doing. I like this change.

Descriptor Heaps and Tables
The next change to consider is how resources are bound to the pipeline. Direct3D 12 introduces the use of Descriptor Heaps and Tables, which sound like simple user mode PODs to point to resources. This moves the previous runtime calls for binding resources (mostly) out of the runtime and into the application code, which again should be a good thing.

In Hieroglyph 3, I currently use pipeline state monitors to manage the arrays of resource bindings at each corresponding pipeline stage. This is done mostly to prevent redundant state change calls, but this could easily be updated to accommodate the flexible descriptors. I'm more or less already managing a list of the resources that I want bound at draw time, so converting to using descriptors should be fairly easy. It will be interesting to try out different mechanisms here to see what gives better performance - i.e. should I keep one huge descriptor heap for all resources, or should I manage smaller ones for each object, or somewhere in between?

Bye Bye Immediate Context
The final major change that I noted is the removal of the immediate context. This one actually seems the least intrusive to me, due to the nature of the deferred context to immediate context relationship in the existing D3D11 API. Essentially both of these beasts use the same COM interface, but deferred contexts are used to generate command lists while immediate contexts consume them. This seems like a small distinction, but in reality you have to design your system so that it knows which context is the immediate one (or else you can't ever actually draw anything) and which are deferred. So they are the same interface only in theory...

In Hieroglyph 3, I would use deferred contexts to do a single rendering pass and generate a command list object. After all of these command lists were generated, I batched them up and fed them to the immediate context. The proposed changes in D3D12 are actually not all that different - they replace the immediate context with a Command Queue which more closely represents what is really going on under the covers with the GPU and driver. Porting to use such a command queue should be fairly easy (you just feed it command lists, same as immediate context), but updating to take advantage of the new monitoring of the command queue will be an interesting challenge.

There was also a Command Bundle concept introduced, which is essentially a mini-command list. These are expected to speed up the time it takes to generate GPU commands to match a particular sequence of API calls by caching those calls into a Command Bundle. This will introduce another challenging element into the system - how big or small should the command bundles be? When should you be using a command list instead of a command bundle? Most likely only profiling will tell, but it should be an interesting challenge to squeeze the most performance as possible out of the GPU, driver, and your application :).

So those are my thoughts about Direct3D 12. Overall I am overtly positive about the performance benefits and the expected amount of additional effort it will require. There aren't any major show-stoppers that I can see, but of course it is still early days and the API can still change or introduce new elements before it is released.

I would be interested to hear if anyone else has considered this or found a particular piece of the talk interesting or if you see any issues with it. Now is the time to give feedback to the Direct3D team - so speak up and start the discussion!
Jason Z

Hieroglyph3 and WPF

As I mentioned in my last entry, I am in the process of evaluating a number of different UI frameworks for use with Direct3D 11 integrated into them. This is mostly for editor style applications, and also an excuse for me to learn some other frameworks. The last few commits to the Hieroglyph 3 codebase have encompassed some work that I have done on WPF, and that is what I wanted to discuss in this post.

As a pure C++ developer, I haven't ever really spent lots of time with managed code. C# seems like a pretty cool language, and lots of people love it, so getting the chance to put together a small WPF based C# example is a cool learning experience for me. To get up to speed, I checked out a couple of PluralSight tutorials on WPF, and away I went.

In general, I have to say that XAML is really the star of this party. The hierarchical nature and the raw power of what you can do with XAML is just silly compared to any other tech that I have used in the past for UI design / layout. I suppose HTML + CSS would be the next closest thing, but the tooling that Microsoft provides for working with XAML is really top notch... The best part about this is that XAML is usable by native C++ on Windows Store apps, so anything I'm learning here will probably apply there too. But I guess that is the topic for another post...

For my sample application, I basically just wanted to show that it was possible to run some of my existing D3D11 code to generate some frames and get them up and running interactively on a WPF based UI. So I'll leave further discussion of WPF and XAML for other posts, and focus on how I got this working.

Direct3D Shared Resources
It turns out that there is already an easy way to interop with Direct3D 9 built right into WPF - it is a ImageSource object called D3DImage. This works great for connecting a D3D9 rendered scene to a WPF app, but I was after D3D11 integration. This isn't supported right out of the box, but it is possible with Direct3D9Ex. The key piece is that D3D9Ex is capable of sharing a surface with D3D11 devices, which allows you to follow a workflow like this:

1. Create a texture 2D in D3D11, specifying the shared miscellaneous flag
2. Get a shared handle from #1's texture
3. Create a texture in D3D9Ex from the shared handle
4. Render into your D3D11 texture
5. Use D3D9Ex to push that shared texture to D3DImage

There are additional details and requirements involved, but this gives you the overall gist of what needs to be done. I found a sample project by Jeremiah Morill that already implemented most of this work in a reusable C++/CLI assembly project, so I used this as the starting point.

After you get the interop stuff working, the next task is to get your native C++ code working in a managed application. This is a fairly well documented practice, as you can write native C++ code and wrap it with a managed C++/CLI wrapper to expose it to other managed code. This was also my first foray into this activity, so it took some experimentation - but it is doable!

The Demo
After all the integration work, additional assembly references, solution setup, project property modifications, and some playing around, I managed to get my sample up and running. What you see below is a simple WPF based C# application, with a main render window that is overlapped by a single button (I know, I know, it doesn't get much more exciting...).


However exciting this may look, it is actually quite relevant. The button being overlapped onto the rendering surface shows that there are no airgap issues here - you can put your UI elements on, over, or composited with your rendered scene. This is actually pretty cool, and a nice capability to have if you are building an editor. It is always nice to have the option to obscure some parts of the render target with UI elements in certain circumstances, so this is a good thing.

I am planning out an article on the whole process, so hopefully I can share all the details fairly and gotchas fairly soon. I'm totally new to the article process here on GameDev.net now, so we'll see how that goes :) Other UI frameworks and development activities are yet to come!
Jason Z
Lately I have found myself looking for an easy (or easier) way to get some native D3D11 code to play nicely with a user interface framework. Way back when I first started out writing Hieroglyph, I had the basic idea that I could just make my own user interface in the engine. I think many people start out with this mindset, but it is rarely the best way to go. You have to build so many pieces to make it work well, and there is almost always one more thing that you have to add to take the next step in a project... It is so much better to have an existing framework that you can simply adopt and put your rendering code into, and you will be significantly more productive if don't have to reinvent the wheel.

Native Frameworks
Unfortunately, there aren't tons of options available for native code. I don't count Win32 as a UI framework, but rather more as a way to make some windows to render into. Technically you can create some basic UI elements, but it doesn't really count. MFC is an option, but it is a relatively old codebase and it can feel pretty clunky until you really have some experience with it. There is also the downside that MFC isn't available with the Express versions of Visual Studio, which limits the audience that can use your code.

wxWidgets is another option that is open source and cross platform. This solves the issue of being available on the Express SKU, but it has a design that is very reminiscent of MFC... There has to be something more modern available. Qt is another open source and cross platform solution, but it is a really, really big framework. In the context of Hieroglyph 3, it would require all of its users to download and manage a whole other library in addition to Hieroglyph 3 itself, which is less than ideal (although it is a viable option).

Managed Frameworks
So if we decided not to use a native framework, but rather a managed one, then some of the issues from above go away too. Basically there are a couple frameworks to choose from: Winforms and WPF. Both of these frameworks are available on Express SKUs, so there is no issue there. Since they are included "in the box", the users don't have manually download any additional libraries on their own. So these seem like a viable option as well. The obvious downside here is that we have to create a native-to-managed interface, which requires careful and deliberate planning on how the UI framework has to interact with the native code. This is non-trivial work, especially if you haven't done it before...

Native and Managed Frameworks
There is one additional possibility that combines both of these worlds. On Windows 8, it is possible to build WinRT apps with native C++ and XAML. This allows for a built in framework, available on Express SKUs, and access from a native codebase - no managed code required. This is really attractive to me, since I have never been much of a managed developer. But there is always a catch... it is only available for WinRT applications. This significantly limits the people that have access to it (at least at the moment) but it still remains as an interesting option.

So What To Do?
Well, currently Hieroglyph3 supports MFC, at least at a basic level. One of the users of the library has shown a sample of using wxWidgets with it in a similar way, so I may try to see if he will contribute that back to the community (@Ramon: Please consider it!). In addition to these options, I also want to explore Winforms and WPF. And in the long run, I want to use C++ and XAML as well. So I choose all of the above, except for Qt at the moment.

My plan is to build some library support for each of these frameworks in a optional additional library manner, similar to how the current MFC solution is separate from the core projects. As I go through each framework, I'll be discussing my impressions of them, as well as discussing the process. Strangely it seems that there isn't too much centrally located information on this very important topic, and I hope to consolidate some of it here in this journal. I'll probably start out by describing the existing MFC solution, so stay tuned for that one next time.
Jason Z
As the title implies, I have been thinking quite a bit more about the object lifetime management models and how they will be used in Hieroglyph 4. In the past (i.e. Hieroglyph 3), I more or less used heap allocated objects as a default, and only generally didn't use stack allocation all that much. The only real exception to this was when working within a very confined scope, such as within a small method or something along those lines. This had the unintentional side effect that I generally used pointers (or smart pointers) for many of the various objects floating around in my rendering engine.

A small background story may be in line here - I am a self taught C++ developer, and also a self taught graphics programmer. In general, I am a voracious reader and so when I was starting out in development, I picked up many of the habits that were demonstrated in various programming books - most of which were graphics programming related. This was fine at the time, and I certainly learned a lot over the years, but I never really took too much time to dive into C++ or some of the nuances of its deterministic memory model.

As I mentioned a couple posts ago, I have really been digging in deep into C++11/14, and trying to gain a deeper insight into why and how certain paradigms are considered good practice, while others are horribly bad practice. The primary problem with all of the pointers that I described above is that I was unwittingly defaulting to using reference semantics everywhere... As far as correctness goes, that isn't really such a big deal - you can of course write correct programs that use only reference semantics (C# and Java use reference semantics almost exclusively). However, C++ gives you a bit more freedom to choose how and when your objects are treated as references or values, so it is worthwhile to really consider how your classes will be used before you write them.

The obvious choice in reference vs. value semantics is determined when you declare the variables for your objects. If you use pointers or (to a lesser extent) references, then you are clearly choosing reference semantics. However, your actual class design itself also plays a big role in defining the semantics of how it gets used. All of the copy control methods (copy constructor, copy assignment operator, destructor, and the move constructor/move assignment operator) essentially prescribe how your class instances are moved around in a program. And as an engine author, you have to choose ahead of time how that should look. I think this single design choice has one of the biggest impacts on how you (and your users) work with your software.

Once again returning to the pointer default... This default is what most beginning C++ developers choose, and it works fine. However, once you start expanding to multithreaded programming, always using pointers can begin to complicate things. If you have the chance to use value semantics, making a copy of a value for a second thread becomes trivially easy - but if you are using reference semantics it is a bit more tricky. You can make a copy of your references, but all the references still point to the same instance - so multithreaded programming gets even more complex than it already is.

In addition, on modern processors, memory access is THE main bottleneck when trying to fully utilize a processing core. If you are using reference semantics, you are inherently less cache coherent than if you use value semantics (I know, I know, there are some cases where it is better to use reference semantics, but in general this is the exception and not the rule). Creating an array of value objects is simple - you just make it, and the objects are initialized how you indicate during their instantiation. They are destroyed when they go out of scope. On the other hand, creating an array of reference objects is more complex... Are they valid references? Can I dereference them now? When do they get initialized? When can they be destroyed? Did I already destroy them? Lifetime management is just plain easy with value semantics, and not as easy with references.

So to get started with HG4, I am taking some extra time to consider a general set of guidelines for object management with this new understanding in mind. Sometimes reference semantics are necessary (i.e. COM pointers force certain reference semantics, and Direct3D is full of COM...) so the real key is to figure out when you can use values, and when you should use references. It truly seems like a bit of an artistic process - the good C++ developers are very good at making these types of design choices. I still find it a bit taxing to come to a good solution, but when you get there, it sure does shine through and makes your API much easier to work with. Let's hope that I can get the initial design right, and then grow organically from there.
Jason Z

MVP Renewal++

I found out today that I have indeed been re-awarded as a Visual C++ MVP this year :) That makes five years running, and I'm really happy that I will have the chance to continue on. There is literally tons of different concepts in modern C++ that I want to start writing about again, so we will have to see what shape that can take... hopefully it can be helpful to some of you out there.

To help demonstrate how the design of a rendering framework is evolving with the new features of C++, I will be updating Hieroglyph substantially. Because of how large the changes will be, I have decided to make a clean break and create Hieroglyph 4. This decision was not made lightly, as I have put many, many hours of work into Hieroglyph 3. In fact, I first posted Hieroglyph 3 on Codeplex back in February of 2010, and it had already been in development for months before that. As a result of our book, as well as lots of posts here on GameDev.net, there is actually a fairly substantial user base for Hieroglyph 3 (including my own use at my day job).

Due to that user base, I think it is prudent to take the next step and create the next version of the engine. This will let existing users of Hieroglyph 3 continue on without too much disruption, and then at some point down the road I will eventually have Hieroglyph 4 in a production ready state. This will also let me be more aggressive in my design changes, so I think it will be good for both cases. I will continue to apply the updates to Hieroglyph 3 when they make sense and don't significantly alter the API surface. That should keep HG3 from getting stale and allow me to maintain a before and after example set - that's the plan anyways :)

It always amazes me how much you learn over the course of a couple years. Since I wrote Hieroglyph 3, I have become much more knowledgeable about software engineering, software design, and I have also become much more aware of the importance of documentation and testing within the software realm. Hopefully I can apply all of this to the new engine going forward.

Anyways, I hope this will be the beginning of another long run, so stay tuned for more renderer design discussions!
Jason Z

Going Native

Going Native - The Conference
This post has two purposes, both related to its title. First and foremost, today through Friday the 'Going Native 2013' conference is happening. If you aren't familiar with it, this is a C++ conference held by Microsoft in Redmond and it is devoted to all things C++. Many of the big names in C++ are speaking there, including Bjarne Stroustrup himself who gave the keynote today.

The conference itself is very low cost for something of its pedigree, but even if you don't attend in person it is streamed for free from the channel 9 website (the link above has all the details). I have to tell you, the information held within some of these presentations is priceless, especially if you want to move beyond mid-range C++ usage. These guys are the titans of C++, and I haven't been disappointed in the content so far. If you have the time, go check it out - you won't be disappointed. You can also take a look at last year's presentations (Going Native 2012), which are also available through the same interface for free.

Going Native - The Person
The second part of this post is more of a personal note than a development story. I'm sure you heard a while back that the DirectX MVP specialty was being phased out. At the time, I was recently given my fourth MVP award for DirectX, and it was a serious bummer... Being an MVP has many benefits, including having access to some of the great engineers working on the technologies we use every day - so getting the news that my particular specialty was not going to be eligible any more was less than good news.

However, as life so often reminds me, there is usually a silver lining to any bad situation. The C++ MVP group was open to discussions with the existing DirectX MVPs, which naturally I was curious to learn more about. Boy am I happy that I had the opportunity to both discuss modern C++ (indeed, to learn of its existence) and to listen in on discussions from some other experts... Modern C++ is like learning a brand new language, complete with totally different programming paradigms, but wrapped in a very familiar syntax. It can be both powerful and simple, safe and efficient - and it is still the same language that I have been using for 10+ years. Except it is better :)

Since my new awakening to C++11/14, I have been feverishly consuming as much content as I possibly could on these new features and how they can be used in graphics programming. In addition, I have started to realize how outdated some of the designs are that I have used in Hieroglyph 3. So I have started to experiment with some heavy duty changes to some of the major systems. These changes aren't quite ready for primetime, but they are in the works. Since they are some big changes, I need to think about how I will support them in the context of Hieroglyph 3 being used for our book - but demonstrating some modern designs is important enough that I will figure out a good solution without nuking anyone's existing code bases built on Hieroglyph.

For some reason, I haven't heard much discussion in the graphics area about C++11/14 features. They are relatively new, but still should start to be used. So I'm going to be focusing on getting some examples out there, and continue learning as I go. I'm certainly no expert, but I'm learning fast and loving every second of it.

I don't know if I'll get re-awarded as a C++ MVP (I find out on October 1st...) but in either case, I'm happy to have rediscovered C++. Like I said, sometimes life throws you a curve ball - but you can still hit a curve ball :) So I hope you guys and girls are ready and willing to come along with me on this new journey, because I am the most excited and motivated as I have been in a long time.
Jason Z
I am sure you have heard by now about the new BUILD 2013 conference and all of the goodies that were presented there. Of special interest to the graphics programmer are the new features defined in the Direct3D 11.2 specification level. I watched the "What's New in Direct3D 11.2" presentation, which provided a good overview of the new features. You can check out the video for yourself here.

Overall, they describe six new features that are worth discussion further:

  1. Trim API: Basically a required helper API that allows the driver to eliminate any 'helper' memory when an app is suspended, which let's your app stay in memory longer.
  2. Hardware Overlay: Light APIs for using a compositing scheme, allowing you to use different sized swap chains for UI and 3D scenes, which supposedly is free when supported in hardware (with an efficient software fallback).
  3. HLSL Shader Linking: This is supposed to let you create libraries of shader code, with a new compiler target of lib_5_0. This could be interesting for distributing lighting functions or modular shader libraries, but I would reserve judgment on how it works until I get to try it out.
  4. Mappable default buffers: Resources that you can directly map even if they are created with the default usage flag. This is something that people have been requesting for a long time, so it is really nice to get into the API.
  5. Low latency presentation API: More or less there is a way to ensure that you get one frame latency in presenting your content to the screen. This is pretty important in cases where the user is actively interacting directly with the screen and can notice any latency between their inputs and the rendered results.
  6. Tiled Resources: This is basically the same idea as a mega texture (from id Software) but it is supported at the API level. It seems like a great addition and I can't wait to try this out.

Overall, for a point release it does seem like a pretty good one. There are some new features to play with, and especially the tiled resources seems like a cool new capability that wasn't there before. There's only one catch... the new features are only available on Windows 8.1 - at least for the foreseeable future. Nobody knows if this will ever be back ported to Windows 7, so if you want to try out the new features you will have to get the preview install of Win8.1 and give it a shot.

So what do you guys think about this release? Do you like it, hate it, or somewhere in between? Have you thought of any new functionality that you can perform with this new functionality???
Jason Z
I received something today that is pretty unique, and I can honestly say that I haven't ever gotten anything of this sort before smile.png Let's see if you are able to notice something different about the copy of our D3D11 book that I received today:


That's right - the copy on the right has been translated cover to cover into Korean! At least that is what the publisher has told me - I have no idea how to read Korean, or to even tell if those characters are real... but I take them at their word!! Here is another shot closer up, also showing the binding:


I have known that the book would be translated for quite some time, but actually getting a copy in my hands was a pretty cool thing to see. As far as I know, this is the only translation and there aren't any in the works, so I guess you are stuck with English or Korean for your reading pleasure smile.png
Jason Z

GPU Pro 4

I just received my copy of GPU Pro 4 today, which was a nice surprise. I had contributed a chapter on Kinect Programming with Direct3D 11, and it is really nice to see it in print. And of course, there is also lots of other interesting articles that I have been digging through as well.

In general, I find it really interesting to see the breadth and depth of topics covered in these type of books. For any given topic, you only get to write one chapter - which means you can't go too deep, or you risk losing the focus of the reader. However, since there is a wide mix of authors, it is quite common to wildly varying topics in them. So I find it fun to read through, and get a general feel for what people are working on out there.

You can find details about the book on Wolfgang's blog page for GPU Pro, including some sample material from some of the chapters. And of course you can find the book on Amazon if you are interested in picking up a copy.

On a personal note, this book adds to my running tally of series that I have contributed to. I have been fortunate enough to contribute to the ShaderX series, Game Programming Gems series, the GameDev.net collection, a complete text on Direct3D 11 programming, an online book for Direct3D 10 programming, several online articles here on GameDev.net, and now the GPU Pro series too. It is a great time to be involved in the realtime rendering field, and I couldn't be happier contributing to it!
Jason Z
Last time I discussed how I am currently using pipeline state monitoring to minimize the number of API calls that are submitted to the Direct3D runtime/driver. Some of you were wondering if this is a worthwhile thing to try out, and were interested in some empirical test results to see what the difference is. Hieroglyph 3 has quite a number of different rendering techniques available to it in the various sample programs, so I went to work trying out a few different samples - both with and without pipeline state monitoring.

The results were a little bit surprising to me. It turns out that for all of the samples which are GPU limited, there is a statistically insignificant difference between the two - so regardless of the number of API calls, the GPU was the one slowing things down. This makes sense, and should be another point of evidence that trying to optimize before you need to is not a good thing.

However, for the CPU limited samples there is a different story to tell. In particular, the MirrorMirror sample stands out. For those of you who aren't familiar, the sample was designed to highlight the multi-threading capabilities of D3D11 by performing many simple rendering tasks. This is accomplished by building a scene with lots of boxes floating around three reflective spheres in the middle. The spheres perform dual paraboloid environment mapping, which effectively amplifies the amount of geometry to be rendered since they have to generate their paraboloid maps every frame. Here is a screenshot of the sample to illustrate the concept:


This exercises the API call mechanism quite a bit, since the GPU isn't really that busy and there are many draw calls to perform (each box is drawn separately instead of using instance for this reason). It had shown a nice performance delta between single and multi-threaded rendering, but it also serves as a nice example for the pipeline state monitoring too. The results really speak for themselves. The chart below shows two traces of the frame time for running the identical scene both with and without the state monitoring being used to prevent unneeded API calls. Here, less is more since it means it takes less time to render each frame.


As you can see, the frame time is significantly lower for the trace using the state monitoring. So to interpret these results, we have to think about what is being done here. The sample is specifically designed to be an example of heavy CPU usage relative to the GPU usage. You can consider this the "CPU-Extreme" side of the spectrum. On the other hand, GPU bound samples show no difference in frame time - so we can call this the "GPU-Extreme" side of the spectrum.

Most rendering situations will probably fall somewhere in between these two situations. So if you are very GPU heavy, this probably doesn't make too much difference. However, once the next generation of GPUs come out, you can easily have a totally different situation and become CPU bound. I think Yogi Beara once said - "it isn't a problem until its a problem."

So overall, in my opinion it is worthwhile to spend the time and implement a state monitoring system. This also has other benefits, such as the fact that you will have a system that makes it easy to log all of your actual API calls vs. requested ones - which may become a handy thing if your favorite graphics API monitoring tools ever become unavailable... (yes, you know what I mean!). So get to it, get a copy of the Hieroglyph code base, grab the pipeline state monitoring classes and hack them up into your engine's format!
Jason Z
My last couple of commits to Hieroglyph 3 addressed a performance issue that most likely all graphics programmers that have made it beyond the basics have grappled with: Pipeline State Monitoring. This is the system used to ensure that your engine only submits the API calls that are really necessary in your rendered frame. This cuts down on any API calls that don't effectively add any value to your rendering workload, but still costs some time to execute them anyway. This is a problem that I have worked through a number of times, I am quite fond of the latest solution that I have arrived at. So let's talk about state monitoring!

More Precise Problem Statement
Before we dive into the solution that I am using, I would like to more clearly identify what the problem is that I am trying to solve. We will consider the process of one rendering pass, since any additional, more complicated rendering schemes can be broken down into multiple rendering passes. I define a rendering pass as all of the operations performed between setting your render targets (or more generally your Output Merger states) and the next time you set your render targets (i.e. the start of the next rendering pass). These could include a sequence like the following:

  1. Set render target view / depth stencil view
  2. Clear render targets
  3. Configure the pipeline for rendering
  4. Configure the input assembler for receiving pipeline input
  5. Call a pipeline execution API (such as Draw, DrawIndexed, etc...)
  6. Repeat steps 3-5 for each object that has to be rendered

After a rendering pass has been completed, the contents of the render targets have been filled by rendering all the objects in a scene. If this is the main view of a scene, you would probably do a 'present' call to copy the results into a window for display to the user. Steps 3 and 4 are each composed of a number of calls to the ID3D11DeviceContext interface that are used to set some states. Each of these API calls takes some time to perform - some more than others, but they are all consuming at least some time. Since we are involving the user application code, the D3D11 runtime, and the GPU driver, some of these calls are really time consuming.

This is a fairly straight-forward definition, but the devil (as usual) is in the details. Since step 6 has you repeating the previous three steps, you are most likely going to be repeating some of the same calls in subsequent pipeline (step 3) and input assembler (step 4) configuration steps. However, the pipeline state is an actual state machine. That means the states that you set are not reset or modified in any way when you execute the pipeline.

So our task is to take advantage of this state retention to minimize the amount of time spent in steps 3 & 4. If there are consecutive states which are setting the same values in the pipeline, we need to efficiently detect that fact and prevent our renderer from carrying out any additional calls that don't actually update the pipeline state in a useful way. Sounds easy enough, but it isn't really that easy to have a general solution to the problem.

The God Solution
One could make the argument that your application code should be able to do this on its own. A rendering pass should be analyzed before you get to calling API functions and by design we would only make the minimum API calls that are needed to push out a frame. Technically this is true, but in practice I don't think it is realistic if your engine is going to support a wide variety of different lighting models, object types, and special rendering passes.

In Hieroglyph 3, each object in the scene carries their desired state information around with them, so that would be the state granularity level - an object. Other approaches would be to collect all similar objects and render them together (I guess that would be 'group' granularity). Whatever granularity you choose to write your rendering code in, it will probably not be at the scene level, where no rendering code resides at lower levels than the scene. For this reason, I discount the 'God' solution - it isn't practical to say that we will only submit perfect sequences of API calls. It isn't possible to know the exact order that every object will be rendered in for all situations in your scene, so this isn't going to work...

The Naive Solution
Another approach is to use the device context's various 'Get' methods to check the value of states before setting them. This might save some small amount of time if you save an expensive state change API from being called, but then again you are losing time for every 'Get' call that you make without preventing an un-necessary call... This one is too variable on the states being used, and can actually end up costing more time than not trying to reduce API calls at all!

State Monitoring Solution
At this point, we can assume that we are going to have to keep some 'model' of the pipeline state in system memory in order to know what the current values of the pipeline are without using API calls to query its state. In Hieroglyph 3, I created a class for each pipeline stage that represents its 'state'. For the programmable shader stages, this includes the following state information:

  1. Shader program
  2. Array of constant buffers
  3. Array of shader resource views
  4. Array of samplers

For the fixed function stages, they each have their own state class. By using an object to represent state, we can then create a class to represent a pipeline stage that holds one copy of its state. We'll refer to that state as the 'CurrentState'. At this point, we could say that any time we try to make an API call that differs from the that currently held in the CurrentState, then we would execute the call and update CurrentState.

This is indeed an improvement, but it can actually be implemented a bit more efficiently. The issue here is that for any of the states that contain arrays, we could potentially call the 'Set' API multiple times when all of the values could be set in a single API call. In fact, from our list of steps above, we can actually collect all of the desired state changes right up until we are about to perform a draw call. If we do this, then we can optimize the number of calls down to a minimum. If we add a second state object to each pipeline stage, which we will call the 'DesiredState', then we have an easy way to collect these desired state changes.

However, the addition of a second state object means that for each draw call, we would have to compare the CurrentState and DesiredState objects. Some of these states are pretty large (with hundreds of elements in an array), so doing a full comparison before each draw call can become quite expensive, and would probably eclipse any gains from minimizing state changes...

You may have already guessed the solution - we link the two state objects and add some 'dirty' flags. Whenever a state is changed in the DesiredState object, it compares only that state with the CurrentState object. If they differ, then the flag is set. If they don't differ, we can potentially update the dirty flag to indicate that an update is no longer needed (saving an API call). Especially when working with the arrays of states, the logic for this update can be a little tricky - but it is possible. With this scheme, we can set only the needed state changes right before our draw call, effectively minimizing the number of API calls with a minimal amount of CPU work. We even have a nice object design to make it easy to manage the state monitoring.

State Monitoring in Hieroglyph 3
That was a long way of arriving at my latest solution. Up to this point, I was implementing individual state arrays and monitoring each one uniquely in each of the pipeline state classes. However, this is very error prone, since there are many similar states, but not all are exactly the same, and you end up with repeated code all over the place. So I turned to my favorite solution of late - templates. I created two templates: TStateMonitor and TStateArrayMonitor. These allow me to encapsulate the state monitoring for single values and arrays into the templates, and then my pipeline stage state objects only need to declare an appropriate template instantiation and link the Current and Desired states together. The application code can interact directly with these template classes (of only the desired state, of course) and you only need to tell the pipeline stage when you are about to make a draw call and that it needs to flush its state changes.

In addition, since they are template classes, you can always use them regardless of what representation of individual states are used. If your engine works directly with raw pointers of API objects, that is fine. If it works with integer references to objects, that's fine too. The templates make the design more nimble and able to adapt to future changes. I have to say, I am really happy with how the whole thing turned out...

So if you have made it this far, I would be interested to hear if you use something similar, or have any comments on the design. Thanks for reading!
Jason Z
The most recent commit to the Hieroglyph 3 repository has reduced my usage of the simple event system in the engine. I think the reasons behind this can apply to others, so I wanted to make a post here about it and try to share my experiences with the community.

A long time ago, I added an event manager to the Hieroglyph codebase. This was really long time ago - something like 10 years ago... Anyway, it was added when I was still in my C++ infancy, and of course I chose to make the event manager a singleton. It was perfectly logical - why in the world would anyone want an event system where not every listener would receive all the messages it signed up for?

Naturally, my coding style has evolved along with time using the language. And naturally I developed a scenario where the event system would need to be replicated in several small mini-application systems (if you missed the last post, I am referring to my Glyphlets!). So I thought, I can just modify the event manager to not be a singleton anymore, and create them as standard objects for each glyphlet.

That would work fine, as long as I didn't make any assumptions in the code that there was only one event manager. Which I didn't do - the event listeners would grab a reference to the singleton at their startup, so essentially all of the event based functionality broke in a seemingly incoherent mess :(

So to dig myself out, I started to review all of the classes that were using the event manager at all. As it turns out, most of them didn't really need to be using events, and I modified them accordingly to use other mechanisms to achieve their messaging. At this point, I am now down to only having applications, Glyphlets (which are mini applications), and the camera classes using the event system. With the breadth of the issue reduced, I can now eliminate the singleton interface method and deal with the minimized problem in a comprehensible way.

The moral of the story is this - if you think you need a singleton, then implement the class as a normal object and create a single instance of it to use. Enforce the singleton on yourself, but don't mechanize it through the class. This way, you still respect the class' potential for being used in multiple instances, and it can be very easy to expand later on. By using the singleton interface, you limit future design decisions without much gain - at least in my view, it is better to trust yourself to follow the access rules than to force yourself into one design later on down the road!