• Advertisement
Sign in to follow this  

DX11 Batching and minimizing DIP's and state changes

This topic is 2519 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

It have come to my attention that although general rendering performance guidelines suggest minimizing draw call count and render state changes, most big AAA tiles do only very minimal optimizations on this or no optimizations at all, for example i PIX'ed few popular games to see whats up:

Mass effect - ~1000 draw calls per frame, 20000(!) render state changes and ~1000 draw primitive UP calls per frame
Civilization V (DX11) - 3000+ draw calls, thousand or so set pixel/vertex shader, 500+ set texture, etc per frame on average

Whats up with this nonsense? Why none even bother to do some optimizations on this matter?

Share this post


Link to post
Share on other sites
Advertisement
Maybe they already optimized what they could...
Maybe they decided that performance was good enough and further optimizing was not worth the effort...
Maybe they were quickly approaching a deadline...

Share this post


Link to post
Share on other sites
You can only minimise so much before the work involved in doing so becomes overly onerous and you get into diminishing returns. Maybe these titles genuinely do have ~1000 unique object states per frame, and therefore can't go any lower? 1000 draw calls is really quite low for a large complex scene, so I certainly wouldn't call it "very minimal or no optimizations". "No optimizations" in a 1,000,000 triangle scene would be 1,000,000 draw calls, after all.

Share this post


Link to post
Share on other sites
[quote name='mhagain' timestamp='1298305365' post='4777071']
You can only minimise so much before the work involved in doing so becomes overly onerous and you get into diminishing returns. Maybe these titles genuinely do have ~1000 unique object states per frame, and therefore can't go any lower? 1000 draw calls is really quite low for a large complex scene, so I certainly wouldn't call it "very minimal or no optimizations". "No optimizations" in a 1,000,000 triangle scene would be 1,000,000 draw calls, after all.
[/quote]

Not true.

Look at this(ignore the gray stuff): [url="http://img202.imageshack.us/img202/2112/greybar.jpg"]http://img202.images...112/greybar.jpg[/url]

Each yield icon (green/gold coins on map) gets a draw call(!), each hex grid overlay (white border around EACH hex) gets a draw call, ignoring the fact that you could do that with one draw call even without using instancing - and with simpler code! With all this i think in screen shot we are getting well over 10k+ dip's, and its NOT fast.

Im not a pro graphics engineer, but with a 10 minute of taught you could render all yield icons in two draw calls and one set texture (they are using DX11 FFS!), and whats even more obvious, you could overlay hex grid with one draw call or none at all if you just sample it when drawing terrain...

My taught is that they just do not give a damn... Same thing for my loved Blizzard - in WoW when drawing terrain chunks, instead of doing simple test "if (stage0Texture != textureToSet)" they just go what a heck and don't do nothing, ignoring the fact that wow is heavily cpu bound and targeted at wide range of hardware, low-mid end including (released in 2001!) - and i tested, i have build the same thing, and doing this simple optimization gave 11% performance gain, for about 20 lines of extremely simple code.


So my taught is - if you are indie developer making your game, don't event try to optimize this stuff, no one is doing it anyway and being fine - even when targeting low-mid end hardware. Doing this kind of optimization have proven to be useless and don't listen to 101 texts of saying batch batch batch, omg draw calls, etc - just write your game flexible and start worrying if you are reaching over 3k in average scene. ;P

Share this post


Link to post
Share on other sites
I don't think there are noobs working for the large companies. I'm pretty sure they know they could have done better, but when deadlines are approaching and the publisher keeps the pressure up and there are other tasks with higher priority AND you know that your game will sell shitloads either way..

I hope your advice to indies to not give a f*** was just sarcastic. :)

Share this post


Link to post
Share on other sites
[quote name='Semei' timestamp='1298476839' post='4777994']
Each yield icon (green/gold coins on map) gets a draw call(!), each hex grid overlay (white border around EACH hex) gets a draw call, ignoring the fact that you could do that with one draw call even without using instancing - and with simpler code! With all this i think in screen shot we are getting well over 10k+ dip's, and its NOT fast.
My taught is that they just do not give a damn... Same thing for my loved Blizzard - in WoW when drawing terrain chunks, instead of doing simple test "if (stage0Texture != textureToSet)" they just go what a heck and don't do nothing, ignoring the fact that wow is heavily cpu bound and targeted at wide range of hardware, low-mid end including (released in 2001!) - and i tested, i have build the same thing, and doing this simple optimization gave 11% performance gain, for about 20 lines of extremely simple code.[/quote]The simple explanation is that all those coins on the map were implemented by a gameplay programmer, not a graphics programmer. Furthermore, graphics programming was most likely outsourced to an engine developer like Emergent.
So -- designer asks gameplay to put coins on the map, gameplay uses the engine to do so. Performance is "ok", so no one delves deeper into the flaws of the engine. The engine's graphics programmer works in a different building and isn't directly exposed to the horrible abuse his modules are receiving from negligent gameplay code, so he doesn't improve his interfaces.

Its very easy to imagine how perfectly optimised code [i]doesn't[/i] get produced in the real world, where time and money are quite limited.[quote name='Semei' timestamp='1298476839' post='4777994']
So my taught is - if you are indie developer making your game, don't event try to optimize this stuff, no one is doing it anyway and being fine - even when targeting low-mid end hardware. Doing this kind of optimization have proven to be useless and don't listen to 101 texts of saying batch batch batch, omg draw calls, etc - just write your game flexible and start worrying if you are reaching over 3k in average scene. ;P[/quote]Yeah.... nah.
I'm working on a console game atm where half the CPU time is consumed by rendering tasks. I would love to perform the optimisations you're on about (we could probably reduce draw calls by 10x), [b]but frankly the schedule doesn't allow time for it[/b]. On the sequel I'll definitely be doing some of this optimising so that we can do more stuff with our CPU, other than waste it on GPU command buffer generation.

Share this post


Link to post
Share on other sites
Thanks Hodgman!

This indeed is also issue when you don't have strict time limit, but are creating and planning to release indie game in timely fashion - and maybe this is even worse, because you get impression that your schedule is OK, throughout you are wasting tons of time on least important details, and not focusing on right parts of code - this and some other aspects of indie game design are the main reason why so many games get abandoned unfinished.

I have seen articles talking about general guidelines for finishing your hobby game in timely fashion and finishing it at all, but they seem to lack relation of real code to fashion and priorities of coding, have not seen a text saying : you probably better skip dip/state change optimizations, as it provides a huge time sink and will delay your project without giving much in return, as instead, there are a lot of information how dips are root of evil and such :D Also primary focus for most people when concerning code is - FAST! and not fast in sense of development time, but in terms of milliseconds.

Someone really should construct some guidelines for this matter, that are down to earth and based on practical, real world examples and experience. Basically how to write code "fast(development time) and fast(code execution time)", and whats good balance in REAL situations.One way of doing this is to produce some sort of performance impact index, based on some sort of reference processor and gpu, saying - "System to minimize dips/state changes - general guide - Avoid", "Easy to use art-game pipeline - general guide - High Focus" etc

Share this post


Link to post
Share on other sites
[quote name='Semei' timestamp='1298638363' post='4778894']
(I) have not seen a text saying : you probably better skip dip/state change optimizations, as it provides a huge time sink and will delay your project without giving much in return.
[/quote]

That's probably because it isn't really true. If you are working on a game as an indie developer and you factor in a system from the start to handle this, it isn't a particularly massive time sink at all. My scene manager took about an hour to code up right at the start of my project and I've barely touched it in the last two years.

As Hodgman says, when working in a large team and when using a third party library for rendering, the task does then become far more complex. Maybe it was cheaper to buy a binaries-only licence for the rendering engine so such optimisations aren't possible? Maybe several developers have argued passionately at meetings to be allowed to perform this optimising but have been vetoed by project managers more concerned about features in time for deadlines? Maybe the business model is based on X percent sales to above and beyond a certain target hardware, making the optimisation useless [i]in this specific business context[/i]?

We simply don't know, but to assume that the developers either 1) didn't know about this or 2) didn't care is unfounded based on the evidence we have.

[quote name='Semei' timestamp='1298638363' post='4778894']
Also primary focus for most people when concerning code is - FAST! and not fast in sense of development time, but in terms of milliseconds.
[/quote]

Disagree. Primary focus is 1) does it work acceptably to return investment and 2) will it be finished in time. Optimising for speed of execution is only necessary, and indeed worthwhile, if either of the two above factors are compromised.

Share this post


Link to post
Share on other sites
[quote name='Aardvajk' timestamp='1298640981' post='4778903']
Disagree. Primary focus is 1) does it work acceptably to return investment and 2) will it be finished in time. Optimizing for speed of execution is only necessary, and indeed worthwhile, if either of the two above factors are compromised.
[/quote]

Well, not in indie game development, i don't think that greediness is of primal concern ;D Seems like project managers and publishers are the real root of evil, and yeah, why not, their concern is greed, not game, that's why most games now-days are crap code and fails at gameplay...

[quote name='Aardvajk' timestamp='1298640981' post='4778903']
If you are working on a game as an indie developer and you factor in a system from the start to handle this, it isn't a particularly massive time sink at all. My scene manager took about an hour to code up right at the start of my project and I've barely touched it in the last two years.
[/quote]

This is bull. If that were true for real game projects, then developer would have made it without even asking, hey, its 1 hour, but as you can see - they DIDNT.
Civ5, as far as i know was not outsourced on graphics, and i think that you missed mentioned hex grid drawing system - its implementation is both time consuming and more depending on hardware, so its either pure stupidity, or code management issue, where one developer is given a certain task and hes used to make it as generic as possible and have no clue how game works as a whole. Point to mention that Civ5 developers were kind of bragging about cool features of using DX11 to take advantage of threading to improve performance, so it WAS concern or they pretended it was, to sound more techy and lure some gamers being fanatic about hi-end graphics. I vote for second.

Share this post


Link to post
Share on other sites
[quote name='Semei' timestamp='1298644496' post='4778921']
Seems like project managers and publishers are the real root of evil, and yeah, why not, their concern is greed, not game, that's why most games now-days are crap code and fails at gameplay...
[/quote]

Project managers and publishers are the reason that game developers can earn a living, feed their families and keep a roof over their heads while creating the successful games you are criticising.

[quote name='Semei' timestamp='1298644496' post='4778921']
This is bull. If that were true for real game projects, then developer would have made it without even asking, hey, its 1 hour, but as you can see - they DIDNT.
[/quote]

You are missing my point - it was trivial for me because I am a solo developer working without a third party graphics engine and without deadlines. In a larger team with competing priorities, decision about where to invest time is not based on performance but based on shipping a finished product that works [i]well enough[/i].

I don't really understand where you are coming from - all the example games you criticise are commercially successful. They therefore work sufficiently well within their own parameters, so any further optimisation would be worthless.

Share this post


Link to post
Share on other sites
[quote name='Aardvajk' timestamp='1298646903' post='4778937']
I don't really understand where you are coming from - all the example games you criticise are commercially successful. They therefore work sufficiently well within their own parameters, so any further optimisation would be worthless.
[/quote]

The whole point of this thread. The misdirected guidance about minimizing state changes and dips, seen everywhere, also lots of developers reading and working their asses off to build it, while the real situation suggests to leave it alone as a general rule. [b]No:[/b] "[i]Batch, batch, batch[/i]". [b]Yes:[/b] "Whatever.", that's the real rule of thumb. Its not [i]good[/i], it [i]works[/i].

I'm primary concerned here of people learning to do stuff and making their way of creating games, and i think that this is mayor aspect, that is taught very wrong and far from reality, so if you are indie/student/hobbyist - this might come to use, as weird as this kind of suggestion might sound.

Share this post


Link to post
Share on other sites
[quote name='Semei' timestamp='1298648325' post='4778949']
The whole point of this thread. The misdirected guidance about minimizing state changes and dips, seen everywhere, also lots of developers reading and working their asses off to build it, while the real situation suggests to leave it alone as a general rule. [b]No:[/b] "[i]Batch, batch, batch[/i]". [b]Yes:[/b] "Whatever.", that's the real rule of thumb. Its not [i]good[/i], it [i]works[/i].

I'm primary concerned here of people learning to do stuff and making their way of creating games, and i think that this is mayor aspect, that is taught very wrong and far from reality, so if you are indie/student/hobbyist - this might come to use, as weird as this kind of suggestion might sound.
[/quote]

Sorry, yes, I do see where you are coming from in that case. Certainly far too much emphasis is put on optimisation in a lot of online resources and the phrase "Premature optimisation is the root of all evil" is very common on these forums.

My apologies if I misunderstood you.

Share this post


Link to post
Share on other sites
[quote name='Semei' timestamp='1298648325' post='4778949']
The whole point of this thread. The misdirected guidance about minimizing state changes and dips, seen everywhere, also lots of developers reading and working their asses off to build it, while the real situation suggests to leave it alone as a general rule. [b]No:[/b] "[i]Batch, batch, batch[/i]". [b]Yes:[/b] "Whatever.", that's the real rule of thumb. Its not [i]good[/i], it [i]works[/i].

I'm primary concerned here of people learning to do stuff and making their way of creating games, and i think that this is mayor aspect, that is taught very wrong and far from reality, so if you are indie/student/hobbyist - this might come to use, as weird as this kind of suggestion might sound.
[/quote]

The problem is you CANT compare AAA development with hobbiest development.

Yes, you can ignore "batch, batch, batch" IFF you can afford to. Most hobby developments probably can, however if you are planning from the ground up and really plan to push thing then you can't.

So, yes, those games you cited probably could get away with that level of drawing, it was considered 'fast enough' and if anything was probably a victim of game play coders, artist and designers.

In our AAA game we have worked very hard to reduce our draw calls and genereally batch things together because due to design and art issue our scene complexity shot up massively in a short period of time and while our level of batching before was fine for the game one this change kicked in we had to work very hard to reduce them (while swearing at art and design and trying to explain that they can't have the framerate they wanted while pushing that many objects into the world).

So, yes 'batch batch batch' is very important if you need, or indeed want, to push the hardware as best you can.

It's not 'bad' advice, it's just advice which you need to be pragmatic about, much like any advice given.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Now

  • Advertisement
  • Similar Content

    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By evelyn4you
      hi,
      until now i use typical vertexshader approach for skinning with a Constantbuffer containing the transform matrix for the bones and an the vertexbuffer containing bone index and bone weight.
      Now i have implemented realtime environment  probe cubemaping so i have to render my scene from many point of views and the time for skinning takes too long because it is recalculated for every side of the cubemap.
      For Info i am working on Win7 an therefore use one Shadermodel 5.0 not 5.x that have more options, or is there a way to use 5.x in Win 7
      My Graphic Card is Directx 12 compatible NVidia GTX 960
      the member turanszkij has posted a good for me understandable compute shader. ( for Info: in his engine he uses an optimized version of it )
      https://turanszkij.wordpress.com/2017/09/09/skinning-in-compute-shader/
      Now my questions
       is it possible to feed the compute shader with my orignial vertexbuffer or do i have to copy it in several ByteAdressBuffers as implemented in the following code ?
        the same question is about the constant buffer of the matrixes
       my more urgent question is how do i feed my normal pipeline with the result of the compute Shader which are 2 RWByteAddressBuffers that contain position an normal
      for example i could use 2 vertexbuffer bindings
      1 containing only the uv coordinates
      2.containing position and normal
      How do i copy from the RWByteAddressBuffers to the vertexbuffer ?
       
      (Code from turanszkij )
      Here is my shader implementation for skinning a mesh in a compute shader:
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 struct Bone { float4x4 pose; }; StructuredBuffer<Bone> boneBuffer;   ByteAddressBuffer vertexBuffer_POS; // T-Pose pos ByteAddressBuffer vertexBuffer_NOR; // T-Pose normal ByteAddressBuffer vertexBuffer_WEI; // bone weights ByteAddressBuffer vertexBuffer_BON; // bone indices   RWByteAddressBuffer streamoutBuffer_POS; // skinned pos RWByteAddressBuffer streamoutBuffer_NOR; // skinned normal RWByteAddressBuffer streamoutBuffer_PRE; // previous frame skinned pos   inline void Skinning(inout float4 pos, inout float4 nor, in float4 inBon, in float4 inWei) {  float4 p = 0, pp = 0;  float3 n = 0;  float4x4 m;  float3x3 m3;  float weisum = 0;   // force loop to reduce register pressure  // though this way we can not interleave TEX - ALU operations  [loop]  for (uint i = 0; ((i &lt; 4) &amp;&amp; (weisum&lt;1.0f)); ++i)  {  m = boneBuffer[(uint)inBon].pose;  m3 = (float3x3)m;   p += mul(float4(pos.xyz, 1), m)*inWei;  n += mul(nor.xyz, m3)*inWei;   weisum += inWei;  }   bool w = any(inWei);  pos.xyz = w ? p.xyz : pos.xyz;  nor.xyz = w ? n : nor.xyz; }   [numthreads(1024, 1, 1)] void main( uint3 DTid : SV_DispatchThreadID ) {  const uint fetchAddress = DTid.x * 16; // stride is 16 bytes for each vertex buffer now...   uint4 pos_u = vertexBuffer_POS.Load4(fetchAddress);  uint4 nor_u = vertexBuffer_NOR.Load4(fetchAddress);  uint4 wei_u = vertexBuffer_WEI.Load4(fetchAddress);  uint4 bon_u = vertexBuffer_BON.Load4(fetchAddress);   float4 pos = asfloat(pos_u);  float4 nor = asfloat(nor_u);  float4 wei = asfloat(wei_u);  float4 bon = asfloat(bon_u);   Skinning(pos, nor, bon, wei);   pos_u = asuint(pos);  nor_u = asuint(nor);   // copy prev frame current pos to current frame prev pos streamoutBuffer_PRE.Store4(fetchAddress, streamoutBuffer_POS.Load4(fetchAddress)); // write out skinned props:  streamoutBuffer_POS.Store4(fetchAddress, pos_u);  streamoutBuffer_NOR.Store4(fetchAddress, nor_u); }  
    • By mister345
      Hi, can someone please explain why this is giving an assertion EyePosition!=0 exception?
       
      _lightBufferVS->viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&_lightBufferVS->position), XMLoadFloat3(&_lookAt), XMLoadFloat3(&up));
      It looks like DirectX doesnt want the 2nd parameter to be a zero vector in the assertion, but I passed in a zero vector with this exact same code in another program and it ran just fine. (Here is the version of the code that worked - note XMLoadFloat3(&m_lookAt) parameter value is (0,0,0) at runtime - I debugged it - but it throws no exceptions.
          m_viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&m_position), XMLoadFloat3(&m_lookAt), XMLoadFloat3(&up)); Here is the repo for the broken code (See LightClass) https://github.com/mister51213/DirectX11Engine/blob/master/DirectX11Engine/LightClass.cpp
      and here is the repo with the alternative version of the code that is working with a value of (0,0,0) for the second parameter.
      https://github.com/mister51213/DX11Port_SoftShadows/blob/master/Engine/lightclass.cpp
    • By mister345
      Hi, can somebody please tell me in clear simple steps how to debug and step through an hlsl shader file?
      I already did Debug > Start Graphics Debugging > then captured some frames from Visual Studio and
      double clicked on the frame to open it, but no idea where to go from there.
       
      I've been searching for hours and there's no information on this, not even on the Microsoft Website!
      They say "open the  Graphics Pixel History window" but there is no such window!
      Then they say, in the "Pipeline Stages choose Start Debugging"  but the Start Debugging option is nowhere to be found in the whole interface.
      Also, how do I even open the hlsl file that I want to set a break point in from inside the Graphics Debugger?
       
      All I want to do is set a break point in a specific hlsl file, step thru it, and see the data, but this is so unbelievably complicated
      and Microsoft's instructions are horrible! Somebody please, please help.
       
       
       

    • By mister345
      I finally ported Rastertek's tutorial # 42 on soft shadows and blur shading. This tutorial has a ton of really useful effects and there's no working version anywhere online.
      Unfortunately it just draws a black screen. Not sure what's causing it. I'm guessing the camera or ortho matrix transforms are wrong, light directions, or maybe texture resources not being properly initialized.  I didnt change any of the variables though, only upgraded all types and functions DirectX3DVector3 to XMFLOAT3, and used DirectXTK for texture loading. If anyone is willing to take a look at what might be causing the black screen, maybe something pops out to you, let me know, thanks.
      https://github.com/mister51213/DX11Port_SoftShadows
       
      Also, for reference, here's tutorial #40 which has normal shadows but no blur, which I also ported, and it works perfectly.
      https://github.com/mister51213/DX11Port_ShadowMapping
       
  • Advertisement