DrawPrimitive, fill-rate and performance

Started by
8 comments, last by Schrompf 17 years, 1 month ago
Hello, I'm having serious performance hits when drawing meshes and using particle systems. First of all, I'm drawing many small primitives (from 200 to about 500 faces each), each one being in its own static vertex buffer. The maximum complexity reaches about 400,000 faces and about 200,000 vertices. There are 100+ DrawPrimitive calls (DrawSubset of ID3DXMesh). Should I create large dynamic vertex buffer, transform the vertices myself and send this data using one DrawPrimitive call instead? My thinking is that this will reduce DrawPrimitive overhead but will use much more bandwidth. Anyone done experiments with this? Any suggestions? I should mention that my primitives are non-textured but use specular lighting and have normal normalization enabled (because I use scaling). The second performance drop appears when drawing many small billboards or few big ones. I am creating the aligned quads myself and order them from back to front. Alpha-blending is used. This really kills the performance. My guess is that fill-rate is a limitation here, but is the video card so weak doing alpha-blending? With about 2,000 small billboards (visible size about 4-5 pixels both horizontally and vertically) my GeForce 7600 GS experiences serious frame-rate drops, while ATI Radeon X550 almost dies with less than 4 FPS. Am I simply limited by hardware or is there any room for optimization? How should I confront both problems?
Advertisement
You can try looking into the instancing sample of the DirectX sdk browser. All those draw calls are talking to your display driver and switching context all over the place. There's an easy way to tell if you're fill limited - just don't look at any of the objects. If your frame rate goes up, you're fill limited. If it doesn't, the bottleneck is elsewhere. Nothing will enter your pixel shaders, but everything still needs to go through your vertex shaders. Don't forget to disable culling!
NVPerfHUD is a great tool for determining where your bottlenecks are.

Emil Jonssonvild
You can pool static vertex buffers together - you don't need to switch over to a fully dynamic vertex buffer in order to do that. And, no, I wouldn't imagine that you'd want to do the transforms CPU-side. I suppose you could dynamically build an index buffer based on which elements you want to draw. You wouldn't necessarily reduce your DIP call count, but if all you do between them is change vertex shader constants, you might see a performance improvement.

I can't really imagine that drawing less than 50,000 pixels is choking a modern card on fill or bandwidth. I'm doing multiple full-screen post effect passes on a 7800 at 1920x960 (so about 2 million pixels each), and not having any problems with frame rate. So, I would expect that it's more a matter of how you submit the particles.

As mentioned, NVPerfHUD is quite handy for figuring out what's going on, with some specific tools for helping determine where your bottlenecks are.
First of all, thanks everybody for answers!

I've resolved particle issue after I've found out that it was not related to fill-rate -> was using single-linked list, which had a very expensive delete operation; using double-linked list for particles eliminated the problem. Although I can still "feel" the fillrate weakness when drawing around 20,000 particles. I also forgot that I've been using D3DMULTISAMPLE_4_SAMPLES all the time. [smile]

As for the geometry, at this moment in the most complex scene I am getting about 2000 DrawPrimitive calls and a maximum of two million faces in scene (half of them are most likely culled out), giving me some 20 FPS with multisampling enabled. This brings a question: is 1,000,000 faces and 2,000 DrawPrimitive calls too much?

Using instancing is pretty much a no go, since I'm not targetting v3 shaders and using dynamic buffers would require a lot of bandwidth for so many vertices.

The reason why I am using lots of primitives is that my entire game is made of procedural geometry. [cool]
Quote:Original post by Lifepower
is 1,000,000 faces and 2,000 DrawPrimitive calls too much?

IMO, not necessarily too much but probably as close as possible to the maximum you can afford unless you target your game to high end graphic cards only. Which brings the question: what is your target hardware ?

Quote:Original post by Lifepower
Using instancing is pretty much a no go


Instancing is possible on older hardware. Try looking at Technique 2: Shader Instancing (with Draw Call Batching) in the instancing sample in the DirectX SDK. 2000 is a lot of draw calls in my opinion.
You can also check out the instancing demo by humus (available on this page). Considering the hardware you're running on, you should be able to use shader model 3 instancing. All ATI SM2 cards support it, though you need to enable it in a special way, and the D3D debug runtime will flag it as an error (since it's a SM3 feature).

BTW, as mentioned, ATI's and NVIDIA's performance tools could be quite useful for finding out what your bottleneck is.
One thing is also: Vertex Buffers optimzied for Vertex Cache? This is a tremendeous speed up. You can gain up to 30% of render speed. Try to look at D3DXOptimizeMeshInPlace or whatever it is called, there should be a parameter for Vertex Cache. It may not help for Billboards but it will help for static mesh for sure. It is an immense speed up if you use it wisely.
And if your app is indeed vertex processing bound. If not, you won't see a difference at all. And in my experience about no apps at all are vertex bound. It does not hurt, though, and it can be done offline, so I recommend doing it anyways.
----------
Gonna try that "Indie" stuff I keep hearing about. Let's start with Splatter.

This topic is closed to new replies.

Advertisement