Jump to content
  • Advertisement
Sign in to follow this  
Peachy keen

taming huge draw command counts

This topic is 4551 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Okay, as many of you know, i'm just learning d3d as a graphics api,and i'm finding in certain situations, my draw command count is just getting out of control. I understand that i can't render the whole screen in one call :P, but it seems like so many things will require their own call, and it's just starting to really feel like the call count is going crazy. It would be really silly if a 2d game of mine lagged on my modern machine, and i've heard i've got about 300ish graphic calls per frame to work with. So here are some of my situations, if anyone can give me a lil advice on how to squeeze these calls down in the most optimized way, I'd really appreciate it. A: Particle systems. Obviously I have to squeeze at least one emitter into each call to draw a particle system, but with each thing moving seperately, it doesn't seem possible to just leave it up to the matrix transform to do the heavy lifting. Dynamic vertex buffers? are these practical to use [i hear they are slower, but 'slower' is a very loose term]. I could just update the position by hand in that case, instead of leaving it up to the matrix, and just focus the matrix at the origin of the emitter. B: LOTS OF LITTLE, MOTIONLESS THINGS! Shell casings, bullet holes in the walls, burn marks on the ground, blood trails, overturned rocks on the ground and blown out electrical components. General environmental damage! This stuff was a breeze in 2d, just blit it all together and store it, reuse it each frame [as soon as a bullet casing comes to rest, just draw it onto the ground, and leave it there, keeping track of it's position so i can redraw it when it re-enters view]. But it seems like i can't draw all these little things as bunches, since they are all seperate and have different textures ect... Can this just not be done with the kind of flexability that it can in 2d? Seems silly to run a seperate draw command for every single bullet hole, burn mark, and doodad on the ground. Any suggestions? C: Large numbers of like things: Anybody know of a proper way to best optimize the display of a large number of the exact same type of thing. If i have 50 infantry on the screen, and they are effectively all exactly the same texture map [in different slides of animation], all the same size, all the same everything except for position and animation frame [which doubtlessly there will be many that share the same animation frame]. Always feel so imprisoned by graphics api's :P.. seems like they are always the weakest part of my projects [or at least i make them the weakest], since they always have a very black-box feel to them : / Should i just stop worrying SO MUCH about the draw command count? and just make my game? Not worry about squeezing every last draw command out?... arg [literally loosing sleep over this stuff, feel it's crippling my game every time i call something that in the back of my mind i feel i could have merged it into another draw command]

Share this post


Link to post
Share on other sites
Advertisement
Dynamic VBs and combined texture maps should get you most of the way there. Hardware instancing helps (if you have it).

Share this post


Link to post
Share on other sites
First of all, you can do thousands of drawing calls without too much problem. So I'd say that you should worry a bit less. If you write your code such that it's easy to replace certain rendering sections, then you can get a working game first and then profile it and optimise what needs to be optimised.

Anyway:

A) Dynamic vertex buffers should work reasonably well. For a large number of particles they are certainly faster than drawing each separately.

B) For bullet holes, etc. you can typically use what you did in 2D: render them into a render target and overlay that over the ground.

C) You can use instancing on hardware that supports it (Radeon 9500 and up and GeForce 6x00 and up). Whether that's applicable will depend on your animation system. I think it's easier to use this for objects that move but don't change shape.

Share this post


Link to post
Share on other sites
just to tell you about my own experience with this:
I 'recently' started writing my own little engine for a 2.5D action game, and I, too, was worried about draw calls. I was having bad dreams, I was killing time with thinking about ways to get around some draw calls and so on. And just because I was thinking about so many things I wasn´t coding that much. After having implemented some of my plans to avoid draw calls I had to realize that they just didn´t work out and I was several times slower with my plan than with just drawing every single mesh with its own draw call.

I´m not about to say you shouldn´t think about it, but don´t waste day and night with it.
Just as ET3D said: Implement the stuff you need and profile it afterwards. If it is too slow, optimize it.

That said, I´ll give a shot at your problems:
- For many objects (same texture and material, just different position) with less polyons I´d say dynamic VBs are a good way to go.
- If you got loads of little objects, they will probably use a really small texture. Put those small textures together into one bigger texture (one 512x512 texture can be used to store 64 64x64 textures for example). That way you can render more of those little objects with one draw call (in conjunction with dynamic VBs for example).

Perhaps that helped,
good luck!

Share this post


Link to post
Share on other sites
Combining little textures into big textures can be a HUGE time saver, as others have suggested. Particle systems can be frustrating because as you've noticed, you can't use the matrix transforms without drawing each particle separately. Instead you need a large dynamic vertex buffer (it might be slower but they are still FAST if you get the locking flags right). Since you can make your dynamic vertex buffer pretty large (10s of thousands or more verts) you will be using the NOOVERWRITE flag almost every batch and thus save a lot of pipeline stalls. The algorithm is simple -- keep adding things into the buffer until it's either full or the next item requires a new texture or render state.

The key exploit here is that a new texture or render state will break your batch, but using new texture COORDINATES will NOT break the batch. So go for the largest possible texture sizes that your target platform can handle efficiently, and combine textures to that size (including textures that might not necessarily be related in any way).

Another advantage to combining textures is that while the entire texture should probably be a power of 2, the internal textures need not be. So you can finally make that 90x90 sprite you've always wanted but had to sacrifice quality (64x64) or memory (128x128) to get. Of course, you'll have to figure out some way to use the remaining space efficiently, but don't overengineer...

The real trick thus becomes coding your game in such a way that your particles, bullet casings, and other things that share a single large texture are rendered more or less consecutively so that you don't have to break the batch. Ideally after you render the batch you never have to go back to that texture again, but anything close is going to be good enough. Consider sorting your objects if there's no good way to naturally line them up.

Share this post


Link to post
Share on other sites
Thanks for the replys all, looks good, and as usual i now have a whole lot more work ahead of me than i did when i posted :P [like writing a texture combiner]. Figuring i can get it done completely in 3 passes. Tiles, garbage on the ground/particles [which are all really tiny textures, or possibly even something i can just use defuse colors for], and bit moving things. Kinda didn't want to use the dynamic buffer for everything, because everything says how slow it is, but if it is 'the way', then so be it [still thinking of trying to get my terrain to sail through with a matrix transform, if only becuase of how many points need to be done]. Something else though, If I use this method, do i lock a index buffer also? and fill that thing up too? or just consider kind of a 'while i'm here i might as well do it all by hand' approach [which won't be hard, since it's just 2D, and all the graphics are textured quads, so i can reuse the verts really easily]. Also it seems like a lot of the math that the gpu would otherwise take off my hands, being done in software. [for the hundreds of matrix transforms, a lot of easy matrix transforms in the case of terrain]


One question though, and it's somewhat unrelated, should i send my draw commands spaced out if possible? or does the gpu handle it just fine if i send them all one after another. Is there any difference?

Share this post


Link to post
Share on other sites
You can definitely have more than one vertex buffer, and you should in your case. Dynamic vertex buffers are FAST (you can easily get 100s of fps with tons of sprites on decent hardware).

But, like you said, the terrain (and other things that don't need to be changing position very often, and that can benefit from the GPUs ability to apply a matrix transform prior to rendering) probably needs to be a static vertex buffer.

To answer your questions you don't need to space your drawing calls out. It's rare that the GPU is actually doing the drawing right when you tell it to. It is probably busy with the last frame, or waiting to queue up enough commands, etc... you can't worry about the timing of your DrawPrimitives.

Also, yes, if you DO use an index buffer alongside the vertex buffer, you will have to lock it also. There's a lot of code that's more or less duplicated when using an index buffer alongside the vertex buffer. (The locking, filling, set stream calls are all similar). Use a dynamic index buffer alongside the dynamic vertex buffer if you use one; you'll have to be updating it (constantly) to tell DX which vertices to render this frame, so a static index buffer is not acceptable.

My suggestion is not to worry about the index buffer part until you understand vertex buffers and get it working. They add a significant amount of complexity and almost no immediately visible speedup.

If you make yourself a nice texture combiner with the features YOU need you will save yourself tons of time in the long run. I use mine many times every day...

Share this post


Link to post
Share on other sites
I think spacing your draw calls is unnecessary, too. However the DX SDK documentation states this in the documentation of EndScene:

Quote:

There should be at most one BeginScene/EndScene pair between any successive calls to present (either Present or Present). BeginScene should be called once before any rendering is performed, and EndScene should be called once after all rendering for a frame has been submitted to the runtime. To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call EndScene as far ahead of calling present as possible.


So you could try to put the call to Present() as far away from EndScene() as you can, e.g

BeginScene()
// all your drawing stuff here
EndScene()
// Do all your game logic, physics and what-not here
Present()



About the index buffers:
If you know that the vertex buffer that belongs to an index buffer will always contain vertices for quads, you can reuse the same index buffer over and over again, because the position of the vertices can change, the order of their usage won´t. So you could fill up a static index buffer with the indices for your maximum number of quads you would want to render in one call and reuse this index buffer for all your quad rendering.
Even if you don´t want to render all the vertices from the beginning of the vertex buffer you can reuse the index buffer and use the BaseVertexIndex parameter in the DrawIndexedPrimitive() function to offset your indices to match the vertices you want to render.
So, I think you could render all your quads with the same index buffer if you layout the vertex buffer every frame carefully.

Share this post


Link to post
Share on other sites
If you use a static index buffer you're going to have to update it every time you have a new particle and/or an old one dies, etc. I think it's much easier to use a dynamic IB alongside the dynamic VB so you don't have to worry about frame coherence at all. For a typical 2D engine I don't see any way this is going to cost any performance at all, compared to a static IB.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!