Jump to content

  • Log In with Google      Sign In   
  • Create Account

Chronicles of the Hieroglyph

HG4 and Object Lifetime Management Models

Posted by , 12 November 2013 - - - - - - · 949 views
HG4, C++11
As the title implies, I have been thinking quite a bit more about the object lifetime management models and how they will be used in Hieroglyph 4. In the past (i.e. Hieroglyph 3), I more or less used heap allocated objects as a default, and only generally didn't use stack allocation all that much. The only real exception to this was when working within a very confined scope, such as within a small method or something along those lines. This had the unintentional side effect that I generally used pointers (or smart pointers) for many of the various objects floating around in my rendering engine.

A small background story may be in line here - I am a self taught C++ developer, and also a self taught graphics programmer. In general, I am a voracious reader and so when I was starting out in development, I picked up many of the habits that were demonstrated in various programming books - most of which were graphics programming related. This was fine at the time, and I certainly learned a lot over the years, but I never really took too much time to dive into C++ or some of the nuances of its deterministic memory model.

As I mentioned a couple posts ago, I have really been digging in deep into C++11/14, and trying to gain a deeper insight into why and how certain paradigms are considered good practice, while others are horribly bad practice. The primary problem with all of the pointers that I described above is that I was unwittingly defaulting to using reference semantics everywhere... As far as correctness goes, that isn't really such a big deal - you can of course write correct programs that use only reference semantics (C# and Java use reference semantics almost exclusively). However, C++ gives you a bit more freedom to choose how and when your objects are treated as references or values, so it is worthwhile to really consider how your classes will be used before you write them.

The obvious choice in reference vs. value semantics is determined when you declare the variables for your objects. If you use pointers or (to a lesser extent) references, then you are clearly choosing reference semantics. However, your actual class design itself also plays a big role in defining the semantics of how it gets used. All of the copy control methods (copy constructor, copy assignment operator, destructor, and the move constructor/move assignment operator) essentially prescribe how your class instances are moved around in a program. And as an engine author, you have to choose ahead of time how that should look. I think this single design choice has one of the biggest impacts on how you (and your users) work with your software.

Once again returning to the pointer default... This default is what most beginning C++ developers choose, and it works fine. However, once you start expanding to multithreaded programming, always using pointers can begin to complicate things. If you have the chance to use value semantics, making a copy of a value for a second thread becomes trivially easy - but if you are using reference semantics it is a bit more tricky. You can make a copy of your references, but all the references still point to the same instance - so multithreaded programming gets even more complex than it already is.

In addition, on modern processors, memory access is THE main bottleneck when trying to fully utilize a processing core. If you are using reference semantics, you are inherently less cache coherent than if you use value semantics (I know, I know, there are some cases where it is better to use reference semantics, but in general this is the exception and not the rule). Creating an array of value objects is simple - you just make it, and the objects are initialized how you indicate during their instantiation. They are destroyed when they go out of scope. On the other hand, creating an array of reference objects is more complex... Are they valid references? Can I dereference them now? When do they get initialized? When can they be destroyed? Did I already destroy them? Lifetime management is just plain easy with value semantics, and not as easy with references.

So to get started with HG4, I am taking some extra time to consider a general set of guidelines for object management with this new understanding in mind. Sometimes reference semantics are necessary (i.e. COM pointers force certain reference semantics, and Direct3D is full of COM...) so the real key is to figure out when you can use values, and when you should use references. It truly seems like a bit of an artistic process - the good C++ developers are very good at making these types of design choices. I still find it a bit taxing to come to a good solution, but when you get there, it sure does shine through and makes your API much easier to work with. Let's hope that I can get the initial design right, and then grow organically from there.

MVP Renewal++

Posted by , 01 October 2013 - - - - - - · 1,004 views
I found out today that I have indeed been re-awarded as a Visual C++ MVP this year :) That makes five years running, and I'm really happy that I will have the chance to continue on. There is literally tons of different concepts in modern C++ that I want to start writing about again, so we will have to see what shape that can take... hopefully it can be helpful to some of you out there.

To help demonstrate how the design of a rendering framework is evolving with the new features of C++, I will be updating Hieroglyph substantially. Because of how large the changes will be, I have decided to make a clean break and create Hieroglyph 4. This decision was not made lightly, as I have put many, many hours of work into Hieroglyph 3. In fact, I first posted Hieroglyph 3 on Codeplex back in February of 2010, and it had already been in development for months before that. As a result of our book, as well as lots of posts here on GameDev.net, there is actually a fairly substantial user base for Hieroglyph 3 (including my own use at my day job).

Due to that user base, I think it is prudent to take the next step and create the next version of the engine. This will let existing users of Hieroglyph 3 continue on without too much disruption, and then at some point down the road I will eventually have Hieroglyph 4 in a production ready state. This will also let me be more aggressive in my design changes, so I think it will be good for both cases. I will continue to apply the updates to Hieroglyph 3 when they make sense and don't significantly alter the API surface. That should keep HG3 from getting stale and allow me to maintain a before and after example set - that's the plan anyways :)

It always amazes me how much you learn over the course of a couple years. Since I wrote Hieroglyph 3, I have become much more knowledgeable about software engineering, software design, and I have also become much more aware of the importance of documentation and testing within the software realm. Hopefully I can apply all of this to the new engine going forward.

Anyways, I hope this will be the beginning of another long run, so stay tuned for more renderer design discussions!

Going Native

Posted by , 04 September 2013 - - - - - - · 605 views
Going Native - The Conference
This post has two purposes, both related to its title. First and foremost, today through Friday the 'Going Native 2013' conference is happening. If you aren't familiar with it, this is a C++ conference held by Microsoft in Redmond and it is devoted to all things C++. Many of the big names in C++ are speaking there, including Bjarne Stroustrup himself who gave the keynote today.

The conference itself is very low cost for something of its pedigree, but even if you don't attend in person it is streamed for free from the channel 9 website (the link above has all the details). I have to tell you, the information held within some of these presentations is priceless, especially if you want to move beyond mid-range C++ usage. These guys are the titans of C++, and I haven't been disappointed in the content so far. If you have the time, go check it out - you won't be disappointed. You can also take a look at last year's presentations (Going Native 2012), which are also available through the same interface for free.

Going Native - The Person
The second part of this post is more of a personal note than a development story. I'm sure you heard a while back that the DirectX MVP specialty was being phased out. At the time, I was recently given my fourth MVP award for DirectX, and it was a serious bummer... Being an MVP has many benefits, including having access to some of the great engineers working on the technologies we use every day - so getting the news that my particular specialty was not going to be eligible any more was less than good news.

However, as life so often reminds me, there is usually a silver lining to any bad situation. The C++ MVP group was open to discussions with the existing DirectX MVPs, which naturally I was curious to learn more about. Boy am I happy that I had the opportunity to both discuss modern C++ (indeed, to learn of its existence) and to listen in on discussions from some other experts... Modern C++ is like learning a brand new language, complete with totally different programming paradigms, but wrapped in a very familiar syntax. It can be both powerful and simple, safe and efficient - and it is still the same language that I have been using for 10+ years. Except it is better :)

Since my new awakening to C++11/14, I have been feverishly consuming as much content as I possibly could on these new features and how they can be used in graphics programming. In addition, I have started to realize how outdated some of the designs are that I have used in Hieroglyph 3. So I have started to experiment with some heavy duty changes to some of the major systems. These changes aren't quite ready for primetime, but they are in the works. Since they are some big changes, I need to think about how I will support them in the context of Hieroglyph 3 being used for our book - but demonstrating some modern designs is important enough that I will figure out a good solution without nuking anyone's existing code bases built on Hieroglyph.

For some reason, I haven't heard much discussion in the graphics area about C++11/14 features. They are relatively new, but still should start to be used. So I'm going to be focusing on getting some examples out there, and continue learning as I go. I'm certainly no expert, but I'm learning fast and loving every second of it.

I don't know if I'll get re-awarded as a C++ MVP (I find out on October 1st...) but in either case, I'm happy to have rediscovered C++. Like I said, sometimes life throws you a curve ball - but you can still hit a curve ball :) So I hope you guys and girls are ready and willing to come along with me on this new journey, because I am the most excited and motivated as I have been in a long time.

The Features of Direct3D 11.2

Posted by , 10 July 2013 - - - - - - · 2,593 views

I am sure you have heard by now about the new BUILD 2013 conference and all of the goodies that were presented there. Of special interest to the graphics programmer are the new features defined in the Direct3D 11.2 specification level. I watched the "What's New in Direct3D 11.2" presentation, which provided a good overview of the new features. You can check out the video for yourself here.

Overall, they describe six new features that are worth discussion further:
  • Trim API: Basically a required helper API that allows the driver to eliminate any 'helper' memory when an app is suspended, which let's your app stay in memory longer.
  • Hardware Overlay: Light APIs for using a compositing scheme, allowing you to use different sized swap chains for UI and 3D scenes, which supposedly is free when supported in hardware (with an efficient software fallback).
  • HLSL Shader Linking: This is supposed to let you create libraries of shader code, with a new compiler target of lib_5_0. This could be interesting for distributing lighting functions or modular shader libraries, but I would reserve judgment on how it works until I get to try it out.
  • Mappable default buffers: Resources that you can directly map even if they are created with the default usage flag. This is something that people have been requesting for a long time, so it is really nice to get into the API.
  • Low latency presentation API: More or less there is a way to ensure that you get one frame latency in presenting your content to the screen. This is pretty important in cases where the user is actively interacting directly with the screen and can notice any latency between their inputs and the rendered results.
  • Tiled Resources: This is basically the same idea as a mega texture (from id Software) but it is supported at the API level. It seems like a great addition and I can't wait to try this out.
Overall, for a point release it does seem like a pretty good one. There are some new features to play with, and especially the tiled resources seems like a cool new capability that wasn't there before. There's only one catch... the new features are only available on Windows 8.1 - at least for the foreseeable future. Nobody knows if this will ever be back ported to Windows 7, so if you want to try out the new features you will have to get the preview install of Win8.1 and give it a shot.

So what do you guys think about this release? Do you like it, hate it, or somewhere in between? Have you thought of any new functionality that you can perform with this new functionality???

Something Special Today...

Posted by , 21 May 2013 - - - - - - · 838 views

I received something today that is pretty unique, and I can honestly say that I haven't ever gotten anything of this sort before Posted Image Let's see if you are able to notice something different about the copy of our D3D11 book that I received today:

Attached Image

That's right - the copy on the right has been translated cover to cover into Korean! At least that is what the publisher has told me - I have no idea how to read Korean, or to even tell if those characters are real... but I take them at their word!! Here is another shot closer up, also showing the binding:

Attached Image

I have known that the book would be translated for quite some time, but actually getting a copy in my hands was a pretty cool thing to see. As far as I know, this is the only translation and there aren't any in the works, so I guess you are stuck with English or Korean for your reading pleasure Posted Image

GPU Pro 4

Posted by , 16 May 2013 - - - - - - · 697 views

I just received my copy of GPU Pro 4 today, which was a nice surprise. I had contributed a chapter on Kinect Programming with Direct3D 11, and it is really nice to see it in print. And of course, there is also lots of other interesting articles that I have been digging through as well.

In general, I find it really interesting to see the breadth and depth of topics covered in these type of books. For any given topic, you only get to write one chapter - which means you can't go too deep, or you risk losing the focus of the reader. However, since there is a wide mix of authors, it is quite common to wildly varying topics in them. So I find it fun to read through, and get a general feel for what people are working on out there.

You can find details about the book on Wolfgang's blog page for GPU Pro, including some sample material from some of the chapters. And of course you can find the book on Amazon if you are interested in picking up a copy.

On a personal note, this book adds to my running tally of series that I have contributed to. I have been fortunate enough to contribute to the ShaderX series, Game Programming Gems series, the GameDev.net collection, a complete text on Direct3D 11 programming, an online book for Direct3D 10 programming, several online articles here on GameDev.net, and now the GPU Pro series too. It is a great time to be involved in the realtime rendering field, and I couldn't be happier contributing to it!

Pipeline State Monitoring - Results

Posted by , 29 March 2013 - - - - - - · 792 views

Last time I discussed how I am currently using pipeline state monitoring to minimize the number of API calls that are submitted to the Direct3D runtime/driver. Some of you were wondering if this is a worthwhile thing to try out, and were interested in some empirical test results to see what the difference is. Hieroglyph 3 has quite a number of different rendering techniques available to it in the various sample programs, so I went to work trying out a few different samples - both with and without pipeline state monitoring.

The results were a little bit surprising to me. It turns out that for all of the samples which are GPU limited, there is a statistically insignificant difference between the two - so regardless of the number of API calls, the GPU was the one slowing things down. This makes sense, and should be another point of evidence that trying to optimize before you need to is not a good thing.

However, for the CPU limited samples there is a different story to tell. In particular, the MirrorMirror sample stands out. For those of you who aren't familiar, the sample was designed to highlight the multi-threading capabilities of D3D11 by performing many simple rendering tasks. This is accomplished by building a scene with lots of boxes floating around three reflective spheres in the middle. The spheres perform dual paraboloid environment mapping, which effectively amplifies the amount of geometry to be rendered since they have to generate their paraboloid maps every frame. Here is a screenshot of the sample to illustrate the concept:

Attached Image

This exercises the API call mechanism quite a bit, since the GPU isn't really that busy and there are many draw calls to perform (each box is drawn separately instead of using instance for this reason). It had shown a nice performance delta between single and multi-threaded rendering, but it also serves as a nice example for the pipeline state monitoring too. The results really speak for themselves. The chart below shows two traces of the frame time for running the identical scene both with and without the state monitoring being used to prevent unneeded API calls. Here, less is more since it means it takes less time to render each frame.

Attached Image

As you can see, the frame time is significantly lower for the trace using the state monitoring. So to interpret these results, we have to think about what is being done here. The sample is specifically designed to be an example of heavy CPU usage relative to the GPU usage. You can consider this the "CPU-Extreme" side of the spectrum. On the other hand, GPU bound samples show no difference in frame time - so we can call this the "GPU-Extreme" side of the spectrum.

Most rendering situations will probably fall somewhere in between these two situations. So if you are very GPU heavy, this probably doesn't make too much difference. However, once the next generation of GPUs come out, you can easily have a totally different situation and become CPU bound. I think Yogi Beara once said - "it isn't a problem until its a problem."

So overall, in my opinion it is worthwhile to spend the time and implement a state monitoring system. This also has other benefits, such as the fact that you will have a system that makes it easy to log all of your actual API calls vs. requested ones - which may become a handy thing if your favorite graphics API monitoring tools ever become unavailable... (yes, you know what I mean!). So get to it, get a copy of the Hieroglyph code base, grab the pipeline state monitoring classes and hack them up into your engine's format!

Pipeline State Monitoring

Posted by , 07 March 2013 - - - - - - · 1,502 views
D3D11, rendering
My last couple of commits to Hieroglyph 3 addressed a performance issue that most likely all graphics programmers that have made it beyond the basics have grappled with: Pipeline State Monitoring. This is the system used to ensure that your engine only submits the API calls that are really necessary in your rendered frame. This cuts down on any API calls that don't effectively add any value to your rendering workload, but still costs some time to execute them anyway. This is a problem that I have worked through a number of times, I am quite fond of the latest solution that I have arrived at. So let's talk about state monitoring!

More Precise Problem Statement
Before we dive into the solution that I am using, I would like to more clearly identify what the problem is that I am trying to solve. We will consider the process of one rendering pass, since any additional, more complicated rendering schemes can be broken down into multiple rendering passes. I define a rendering pass as all of the operations performed between setting your render targets (or more generally your Output Merger states) and the next time you set your render targets (i.e. the start of the next rendering pass). These could include a sequence like the following:
  • Set render target view / depth stencil view
  • Clear render targets
  • Configure the pipeline for rendering
  • Configure the input assembler for receiving pipeline input
  • Call a pipeline execution API (such as Draw, DrawIndexed, etc...)
  • Repeat steps 3-5 for each object that has to be rendered
After a rendering pass has been completed, the contents of the render targets have been filled by rendering all the objects in a scene. If this is the main view of a scene, you would probably do a 'present' call to copy the results into a window for display to the user. Steps 3 and 4 are each composed of a number of calls to the ID3D11DeviceContext interface that are used to set some states. Each of these API calls takes some time to perform - some more than others, but they are all consuming at least some time. Since we are involving the user application code, the D3D11 runtime, and the GPU driver, some of these calls are really time consuming.

This is a fairly straight-forward definition, but the devil (as usual) is in the details. Since step 6 has you repeating the previous three steps, you are most likely going to be repeating some of the same calls in subsequent pipeline (step 3) and input assembler (step 4) configuration steps. However, the pipeline state is an actual state machine. That means the states that you set are not reset or modified in any way when you execute the pipeline.

So our task is to take advantage of this state retention to minimize the amount of time spent in steps 3 & 4. If there are consecutive states which are setting the same values in the pipeline, we need to efficiently detect that fact and prevent our renderer from carrying out any additional calls that don't actually update the pipeline state in a useful way. Sounds easy enough, but it isn't really that easy to have a general solution to the problem.

The God Solution
One could make the argument that your application code should be able to do this on its own. A rendering pass should be analyzed before you get to calling API functions and by design we would only make the minimum API calls that are needed to push out a frame. Technically this is true, but in practice I don't think it is realistic if your engine is going to support a wide variety of different lighting models, object types, and special rendering passes.

In Hieroglyph 3, each object in the scene carries their desired state information around with them, so that would be the state granularity level - an object. Other approaches would be to collect all similar objects and render them together (I guess that would be 'group' granularity). Whatever granularity you choose to write your rendering code in, it will probably not be at the scene level, where no rendering code resides at lower levels than the scene. For this reason, I discount the 'God' solution - it isn't practical to say that we will only submit perfect sequences of API calls. It isn't possible to know the exact order that every object will be rendered in for all situations in your scene, so this isn't going to work...

The Naive Solution
Another approach is to use the device context's various 'Get' methods to check the value of states before setting them. This might save some small amount of time if you save an expensive state change API from being called, but then again you are losing time for every 'Get' call that you make without preventing an un-necessary call... This one is too variable on the states being used, and can actually end up costing more time than not trying to reduce API calls at all!

State Monitoring Solution
At this point, we can assume that we are going to have to keep some 'model' of the pipeline state in system memory in order to know what the current values of the pipeline are without using API calls to query its state. In Hieroglyph 3, I created a class for each pipeline stage that represents its 'state'. For the programmable shader stages, this includes the following state information:
  • Shader program
  • Array of constant buffers
  • Array of shader resource views
  • Array of samplers
For the fixed function stages, they each have their own state class. By using an object to represent state, we can then create a class to represent a pipeline stage that holds one copy of its state. We'll refer to that state as the 'CurrentState'. At this point, we could say that any time we try to make an API call that differs from the that currently held in the CurrentState, then we would execute the call and update CurrentState.

This is indeed an improvement, but it can actually be implemented a bit more efficiently. The issue here is that for any of the states that contain arrays, we could potentially call the 'Set' API multiple times when all of the values could be set in a single API call. In fact, from our list of steps above, we can actually collect all of the desired state changes right up until we are about to perform a draw call. If we do this, then we can optimize the number of calls down to a minimum. If we add a second state object to each pipeline stage, which we will call the 'DesiredState', then we have an easy way to collect these desired state changes.

However, the addition of a second state object means that for each draw call, we would have to compare the CurrentState and DesiredState objects. Some of these states are pretty large (with hundreds of elements in an array), so doing a full comparison before each draw call can become quite expensive, and would probably eclipse any gains from minimizing state changes...

You may have already guessed the solution - we link the two state objects and add some 'dirty' flags. Whenever a state is changed in the DesiredState object, it compares only that state with the CurrentState object. If they differ, then the flag is set. If they don't differ, we can potentially update the dirty flag to indicate that an update is no longer needed (saving an API call). Especially when working with the arrays of states, the logic for this update can be a little tricky - but it is possible. With this scheme, we can set only the needed state changes right before our draw call, effectively minimizing the number of API calls with a minimal amount of CPU work. We even have a nice object design to make it easy to manage the state monitoring.

State Monitoring in Hieroglyph 3
That was a long way of arriving at my latest solution. Up to this point, I was implementing individual state arrays and monitoring each one uniquely in each of the pipeline state classes. However, this is very error prone, since there are many similar states, but not all are exactly the same, and you end up with repeated code all over the place. So I turned to my favorite solution of late - templates. I created two templates: TStateMonitor<T> and TStateArrayMonitor<T>. These allow me to encapsulate the state monitoring for single values and arrays into the templates, and then my pipeline stage state objects only need to declare an appropriate template instantiation and link the Current and Desired states together. The application code can interact directly with these template classes (of only the desired state, of course) and you only need to tell the pipeline stage when you are about to make a draw call and that it needs to flush its state changes.

In addition, since they are template classes, you can always use them regardless of what representation of individual states are used. If your engine works directly with raw pointers of API objects, that is fine. If it works with integer references to objects, that's fine too. The templates make the design more nimble and able to adapt to future changes. I have to say, I am really happy with how the whole thing turned out...

So if you have made it this far, I would be interested to hear if you use something similar, or have any comments on the design. Thanks for reading!