Jump to content

  • Log In with Google      Sign In   
  • Create Account






Multithreaded Rendering

Posted by Jason Z, 07 July 2010 · 719 views


Multithreaded Rendering with D3D11


Lately I have been splitting my time between writing some new material for a forth coming project and trying to understand the design differences needed to allow multithreaded rendering in D3D11. Since the latter is the more complicated one, I thought I would flesh out a few ideas here and talk a little bit about it.

In Hieroglyph, the renderer has always been a singleton-like class. It was not a stricly enforced singleton, but I did always create a static accessor method that would let anyone that had the renderer's header included get a reference to it. This always made rapid development easier, but over the years I found myself using the static method less and less - it just didn't seem like such a good idea anymore.

Now I know why - when you want to make changes to your designs, it is good to have the flexibility to try out different things - and singletons are kind of like a lock-in to a particular way of doing things. They aren't necesarily bad, but they certainly aren't good either... Especially when you want to do something crazy like let your engine think it has more than one renderer [grin].

Multithreading

My desire to add multithreading to the engine is to take advantage of the deferred rendering context (not to be mistaken for deferred rendering in general!). This basically allows a bunch of rendering calls to be made on a 'deferred' context, which basically creates a command list to be replayed later on the immediate context. This, in theory, should minimize the CPU interfering with the GPU if your command lists are long enough without any stalls.

Hieroglyph also utilizes a design concept of RenderViews to encapsulate a complete scene rendering pass. Want to generate a shadow map? Use a render view. Want to generate an ambient occlusion buffer? Use a render view. As it turns out, this paradigm provides a convenient 'batch' of work to supply to a deferred context.

The tricky part is to be able to make the renderer (which by now is certainly a non-trivial class) be able to be mostly cloned, and have its context replaced by a deferred one. This renderer instance can then be passed into the 'render' function of a render view, and he is none the wiser about if it is rendering immediately or generating a command buffer. Sounds pretty good so far...

Of course, the 'real' renderer would be in charge of managing these doppelganger renderers (I like the word doppelganger, so I'll use it for these mini-me renderers [grin]). In fact, I'm considering making a renderer interface and allowing the renderer to replace parts of itself in a new implementation. The tough part is that all of the resource and state references that are stored in the renderer must be accessible to all of the doppelgangers, but the complete parameter system which is used to connect all those resources and states to textual names in the shaders must be unique to each doppelganger due to the fact that the parameter system stores state that can't be shared between multiple render views simultaneously. Plus, using a locking system on the parameter system is a non-starter since it is one of the most widely used parts of the renderer...

The solution? I'm going to create inner classes to hold all the parameter systems and their methods, and allow it to be dynamically recreated and/or cloned. This should provide enough flexibility to use the interface technique I mentioned above, and create a subclass of the renderer with only the functionality that I want to be used for deferred contexts... I still have a ways to go until I am ready for some trials, but I think this overall modification should be worth while.

Soon I will have a myriad of renderers running around in my engine [cool]!




Don't be surprised if multithreaded rendering doesn't result in as much a gain as you're expecting or even worse lowers the performance altogether, especially considering that on a multi-core machine, command lists are already dispatched for execution on multiple threads in the driver. Based on what I've seen, unless there's a significant amount of parallelism in your rendering pipeline, the cost outweighs any benefits substantially. The MultithreadedRendering11 sample that comes with the SDK for instance, which is supposed to make a good use of the API, runs twice / thrice slower on my quad core machine when choosing the MT renderpath as opposed to ST.

Anyway, keep us informed. I'd love to see how your implementation is going to end up.
Thanks for the heads up - although this is more of an experiment than an expected performance booster. I'm fairly certain that it will take somewhat of a contrived scene to make any difference at... (I.e. a bazillion objects in my scene graph...).

Also, I wouldn't make the assumption that if the sample isn't able to do it... you know what I mean... Let's see if I can do better. ;)
Performance of deferred contexts have always had me slightly confused, mainly due to the SDK sample and the fact that I haven't actually worked with them yet. That said I've read some developers saying that they are great, they didn't elaborate though. Anyway Jason I'm looking forward to reading about your findings, and it would be very interesting to maybe see some benchmarks. I'm guessing there is some criteria you have to meet before you start to see a speed up (high amount of draw calls? something else?)
That's my basic assumption, that you need to have something that would normally eat up a bunch of CPU time to make all of the draw calls. Still with D3D11 it is hard to know which calls will really take lots of time and which are trivial. Also, with multiple cores running and accessing the scene at the same time I would have to think that the CPU side would have a terrible cache coherency access pattern... That might help explain the issues mentioned above about the quad core not performing very well.

Even so, I want to build the tools and then try to find out where it can be useful.
Good point, I guess the only way to really find out is to actually do it. The idea I have been toying with for a bit was breaking scene rendering up into jobs (I already have a job system for other tasks, just haven't touched my renderer yet) and kicking off jobs to threads and letting them use a pool of deferred contexts to do things like, render out gbuffer, update shadow map a, update shadow map b... etc, do post process filters, etc.

Curious to see how it performs. Probably no benefit at all for things like post processing, but it does help at least with breaking the code up nicely.

September 2014 »

S M T W T F S
 123456
78910111213
14151617181920
21 222324252627
282930    

Recent Comments

Recent Comments

PARTNERS