DX11 multithreading - why bother?

Started by
36 comments, last by bvanevery 14 years ago
DX11 allows a graphics pipeline to be multithreaded. But, why do I want to use my available CPU parallelism on feeding the graphics pipeline? Games have got other jobs to do, like physics simulation and AI. Coarse parallelism would seem to do fine here, and the app would be easier to write, debug, and port to other platforms. Maybe you say you want to use the GPU for physics simulation and AI, and so you need a tighter coupling between producer threads and the consuming graphics pipeline. Fine, but then you've locked yourself into DX11 HW. My 2.5 year old laptop is DX10 class HW, for instance. Also, your physics and AI code would be API specific. Not only does this limit you to Microsoft platforms, but GPUs do not have the nicest set of programming languages and tools available. We put up with GPUs when we want things to be fast; they're pretty much a detriment to programmer productivity. What am I missing here? Does anyone have a compelling rationale for bothering with more tightly coupled multithreading? Cynically, this seems like a way for Microsoft / NVIDIA / ATI to push perceived bells and whistles and sell HW "upgrades". Maybe they really can show a pure graphics benefit on high end HW with a lot of CPU cores. But most consumers don't have high end HW, and there's more to games than pure graphics. DX11 is way ahead of the installed base. Last I checked, consumers are only just now getting around to Vista / Windows 7 and DX10 class HW, and that took ~3 years. Do you want to waste all your time chasing around the top tier of game players? Some games have lost a lot of money doing that, like Crysis. Also, the performance results I've seen on my midrange consumer HW are not compelling: MultiThreadedRendering11 demo D3D11 Vsync off (640x480), R8G8B8A*_UNORM_SRGB (MS1, Q0) NVIDIA 8600M GT laptop, 256MB dedicated memory, driver 195.62 (this is DX10 class HW) windowed, with mouse focus in window ~22 fps Immediate ~20 fps Single Threaded, Deferred per Scene ~21 fps Multi Threaded, Deferred per Scene ~20 fps Single Threaded, Deferred per Chunk ~18 fps Multi Threaded, Deferred per Chunk Methodology: I manually observed the demo window. I picked fps values that seem to occur most frequently. I went through all the settings twice, just in case some system process happened to slow something down. These values seem reasonably stable. I didn't worry much about fractions. I wouldn't regard a difference of ~1 fps as significant, as it's probably a 0.5 fps difference. ~2 fps is observable, however. To the extent that multithreading matters at all, it seems to slow things down slightly. This demo does not make a compelling case for bothering with DX11 multithreading on midrange consumer HW. Does anyone have some code that demonstrates an actual benefit?
gamedesign-l pre-moderated mailing list. Preventing flames since 2000! All opinions welcome.
Advertisement
DX11 multithreading needs to be supported by the hardware, otherwise it's just a software fallback and it's slower that way than immediate mode, obviously. AFAIK no pre-DX11 card supports it.

http://msdn.microsoft.com/en-us/library/ff476893%28VS.85%29.aspx
Every time you implement a singleton, God kills a kitten. Please, think of the kittens!
The other point is: FPS is an old value. It is ok for single threaded games. But for multi-threading, this only shows you how many frames the graphics device can render a scene. In background, the game can run the speed it wants and perform much complex things. The more complex a scene becomes, the more important multi-threading will be.
Multithreading the graphics pipeline is nothing new to directx. DX9 had some parallel ability that many game companies made use of.

Why go to all of the dev and test effort to make a parallel API if no one wants it? Well, people do want it, game companies want it. DX10 had no multithreading abilities and many many requests came in asking for it. So lets look at some of the reasons why.

Object creation is slow. It can stall your rendering thread any time your app discovers that it needs to create a new object. These calls are slow enough that MS, ATI, NVIDIA all wrote white papers telling developers to avoid creating and destroying resources during the application runtime. The API supports multithreaded creates so that you can defer to the driver to pick the best times to create objects -- for instance when it has a few spare cycles -- which allows your rendering thread to continue its work -- which is to get stuff drawn to the screen.

Next, DX11 supports deffered contexts. These allow multiple threads to build command lists at the same time and for the DX runtime to preform validation on separate threads in advance. DX10 was an API redesign where one of the many goals was to reduce CPU overhead of the API calls. CPU overhead was a huge problem for DX9 -- and many many game companies were limited in what they could get on screen just because the API ate too much CPU. So DX10 reduce that cost significantly, in some places by an order of 10-100. However there are some calls that were difficult to trim down because the validation was necessary, or perhaps the driver had a lot of work to do. Being able to build command lists on separate CPU threads allows some of that work to take place in parallel and in advance of trying to actually draw the data. Several game studios are already taking advantage of deffered contexts and are seeing improvements in performance, even when using the CPU fallback for lack of driver support.

So, DX9 would allow roughly 2000 API calls per frame before the API would become a bottle neck, DX10 is around 12000, DX11 should be even higher when using deffered contexts. These are call limits based on using the whole API to do actual work, not just calling some API like SetPrimitiveTopology() x number of times. The trouble is that studios are trying to put more and more stuff on the screen and would surely take advantage of anything that could be provided performance wise.

Plus your engine has to do a lot of CPU work anyway to make draw calls. It has to build matrix transformations, sort objects and draw calls, make all sorts of decisions on what to draw and how to draw it. All of this could be done in parallel with big wins -- provided that your app actually has enough work to do that these things become bottle necks.

A consumer won't need high end hardware to take advantage of multithreading. It's all about preventing the GPU from being starved of data to crunch.

I don't think that there are many drivers out yet that fully support the multithreading APIs yet. This feature requires a lot of effort to get right and is a huge test burden -- but they will come out eventually.

The DX10.1 feature level supports hardware multithreading. This means that there is a reasonable sized slice of hardware out there already that can support this stuff once drivers arrive.

AAA Games take 2-4 years to develop - about the time span you pointed out required to adopt a new technology. Interesting how that works.

Vendor lock in is not an insurmountably problem for developers. The reality is that there are lots of game engines that wrap the graphics system into a layer so that they can run on xbox, or PC, or Playstation. These problems have been solved over and over again and are just part of reality. These same game engines have multithreaded deffered contexts built in because it makes a difference. DX11 gives them a way to map their engine API more closely to the hardware which results in a bigger win. There's no reason why an API should really lock you to any vendor if you layer your software. You want to support someone else, then target them too.

GPU tools and languages have been getting better and better over the years. Sure it's not as ideal as native tools, but it's getting there. With the spreading use of DX, compute, CUDA, opencl, etc. more and more people invest in GPU technologies which means that the whole infrastructure continues to improve. Lack of perfect tools shouldn't stop you from leveraging the amazing power of the GPU -- though I admit there are areas of debugging that are still frustrating but they will get better. People with a lot of practice writing shaders can actually get a lot done. It's not python, but it's also not asm.

A new API or hardware rev will always be ahead of the install base at launch time. This is not new.

Not all of the available API's are needed by every developer. Multithreading probably falls into one of those categories of optimization -- why do it if you don't have a problem. Granted multithreading normally requires a lot more forethought in code design, but I guarantee that if you're not seeing a win, it's because you're not running a scenario that it was designed to fix -- which is CPU and DX API bottle-necking.
Quote:Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?
The game I'm writing at the moment is based on a SPMD (single program multiple data) type architecture, where essentially the same code is executed on every thread, with each thread processing a different range of the data. Every thread does physics together, then they all do AI together, then they all do rendering together, etc...
Well there is a point, in that you don't have to use or follow multi-threading if the situation doesn't require. Just be flexible and pick the best suited tools/options/solutions for your project.
Quote:Original post by darkelf2k5
DX11 multithreading needs to be supported by the hardware, otherwise it's just a software fallback and it's slower that way than immediate mode, obviously. AFAIK no pre-DX11 card supports it.

http://msdn.microsoft.com/en-us/library/ff476893%28VS.85%29.aspx


That's not a HW support issue, that's a driver support issue. Theoretically, a DX11 multithreading application architecture should benefit a DX10 class card, if the drivers have been updated. In practice, I don't know if IHVs have updated their drivers, or will update them. It's quite possible that they'll be cheap bastards and expect people to just buy DX11 HW. If that happens in practice, then DX11 multithreading will have no benefit whatsoever on older HW.

I suppose I'll have to check my own driver. NVIDIA's support of older laptop HW has been notoriously poor. They dumped the problem in OEM's laps for some silly reason. For quite some time, their stock drivers refused to install on laptops; you had to get your driver from the OEM. Of course, the OEMs don't care about updating their drivers very often so you end up with really old drivers that don't have current features and fixes. Only recently did NVIDIA start to offer a stock driver that will work on laptops. There is still a disconnect as far as their most current drivers; for instance, the recently released OpenGL 3.3 driver will not install by default on my laptop. I have been getting around these problems using laptopvideo2go.com, a website that adds .inf files to enable the drivers on laptops. This doesn't help the general deployment situation however.
gamedesign-l pre-moderated mailing list. Preventing flames since 2000! All opinions welcome.
Quote:Original post by Pyrogame
The other point is: FPS is an old value. It is ok for single threaded games. But for multi-threading, this only shows you how many frames the graphics device can render a scene.


There is no readout for "CPU load" in the MultiThreadedRendering11 demo. This is unfortunate as it would be useful diagnostic information. That's part of why I asked if anyone had code that demonstrates an actual benefit.

Quote:In background, the game can run the speed it wants and perform much complex things. The more complex a scene becomes, the more important multi-threading will be.


I think you may have missed the point. You don't need DX11 multithreading to do multithreading in your app. You can have an AI thread, a physics thread, or whatever. Your multithreading architecture will be simpler to write and debug, and it will not be tied to DX11.
gamedesign-l pre-moderated mailing list. Preventing flames since 2000! All opinions welcome.
Quote:Original post by Hodgman
Quote:Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?


Because the debugging will drive you nuts.

Because it can easily become premature optimization.
gamedesign-l pre-moderated mailing list. Preventing flames since 2000! All opinions welcome.
A current high end desktop CPU has 8 hardware threads, and that number is only going to rise in the future. What possible reason could MS have for not improving multithreaded support? Coarse parallelism in games is okay up to 4 threads, maybe 6. Moving past that will require us to move beyond the rather naive approach of one graphics thread.
Quote:
Quote:
Quote:Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?


Because the debugging will drive you nuts.
Jeez, it's not like these are problems never tackled before. People in other segments of software have been dealing with these issues for ages.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

This topic is closed to new replies.

Advertisement