most efficient general rendering strategies for new GPUs

Started by
92 comments, last by maxgpgpu 11 years, 9 months ago
Re: Draw calls.

This is, and remains, an important issue.
Draw calls, while cheaper than D3D9, can still suck up CPU power depending on what you are doing. If you are updating buffers and using them in a draw call the driver has to shuffle memory under the hood, copy things around, update other things. While avoiding them to the extreme of the OP is a bit crazy even so you should be careful as they can suck away CPU time pretty quickly.

With D3D11, using multi-threaded deferred contexts (something GL doesn't have) you are, on something like an i7, top out at ~15,000 draw calls per frame is you want to maintain 60fps. BF3 tops out at around 7,500 per frame as memory serves.

In short; don't go crazy, keeping your draw calls down remains a good thing due to driver overhead.

(For reference using a very CPU heavy test loop and performing 50,000 draw calls per frame; a 2.6Ghz i7 with a NV GTX470 GPU can't even clear 30fps using 6 cores to render. An X360, using the same code base and same test, will happily do 60fps. This is purely CPU overhead causing the problem and most of the time is the driver doing work to shuffle data around. Clearly this wouldn't play out in a real game situation but there is still reason to be concerned about CPU cost.)

Re: varying verts per instance

You CAN do this... although you probably shouldn't.
However you don't do it via traditional instancing instead you use the Geo. shader to create the extra vertex data - but this comes at a cost as the output of a GS has to be serialised correctly which can introduce significant bottlenecks in the GPU.

Generally, unless you have very little work on the GPU and are totally rammed on the CPU, you won't want to do this instead just take the hit of an extra draw call per model type. Chances are you aren't going to have that many that require this anyway so its not going to be a huge CPU cost and you avoid a large GPU cost.
Advertisement

[quote name='maxgpgpu' timestamp='1340132594' post='4950675']
[font=tahoma,geneva,sans-serif]I should probably do my homework rather than ask this silly question, but... here goes anyway. Feel free to ignore it.[/font]

[font=tahoma,geneva,sans-serif]Can instancing support in the latest generation of GPUs make it possible to pervert instancing in the following way. Assume we do instancing the regular way, so this question only refers to a large set of unique objects [that exist only once in the game/simulation]. Can number of vertices be a per-instance variable? What I'm wondering here is whether it might be possible to consider all these diverse objects as separate instances of some general amorphous object?[/font]

[font=tahoma,geneva,sans-serif]In the instancing I'm familiar with, every instance has the same number of vertices. This is for jobs like rendering a crapload of leaves on trees, and the per-instance data tells for each leaf: position, orientation, color (for example). However, if per-instance can include # of vertices and maybe a couple more items, perhaps every object with every number of vertices could be rendered with instancing. That sounds wacko off hand, but then effectively instanceID means objectID, so instanceID can double as the index into an array of general local-to-view transformation matrices.[/font]

[font=tahoma,geneva,sans-serif]This probably exceeds the flexibility of the instancing mechanism, but then again, maybe it doesn't. Any comments?[/font]


I doubt it - if the number of verts needs to change then it seems a reasonably good bet that the texture also needs to change (otherwise your texcoords would be out of whack) so you're looking at a separate draw call anyway.
[/quote]
My vertex structures contain a textureID filed that indexes into the texture array. Therefore, that's not a killer even now. Obviously this is more-or-less necessary in my current scheme that renders unlimited objects in a single draw call --- each object can have different textures, different normalmaps, different specularmaps, etc. My 64-byte vertex structure is running low on free bits at this point, so my alternative is to eliminate those textureID fields [and matrixID field] and replace that with an objectID field that indexes into a texture to get all the information that could possibly be needed (at the expense of an extra texture-fetch per vertex). Unfortunately, that eliminates one nice feature of the current scheme --- the ability to specify texture, normalmap, specularmap on a per-vertex basis, not just per-object.
The moral of the story is that draw calls are still not free, but you don't need to pathologically avoid them as much as you did before. 7.5k calls in a shipping AAA title would certainly have given everyone the horrors not so long ago. "Going crazy" can work in both directions...

Indexing into a texture array is a decent way of avoiding changes and keeping calls down, but it adds the constraint that all of your textures must be the same size. You're not going to use the same texture size for a small pebble or for a particle as you use for a brick wall, I hope. Aiming for the entire scene in a single call also constrains you you to using the same material properties for all of your objects. If you're happy with that tradeoff, then sure, go for it, but it really reduces your setup's general-purpose utility. You can't even do something as simple as enable alpha blending for a window but keep it disabled for everything else. That makes the objective something more of theoretical interest than practical utility.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


The moral of the story is that draw calls are still not free, but you don't need to pathologically avoid them as much as you did before. 7.5k calls in a shipping AAA title would certainly have given everyone the horrors not so long ago. "Going crazy" can work in both directions...

Indexing into a texture array is a decent way of avoiding changes and keeping calls down, but it adds the constraint that all of your textures must be the same size. You're not going to use the same texture size for a small pebble or for a particle as you use for a brick wall, I hope. Aiming for the entire scene in a single call also constrains you you to using the same material properties for all of your objects. If you're happy with that tradeoff, then sure, go for it, but it really reduces your setup's general-purpose utility. You can't even do something as simple as enable alpha blending for a window but keep it disabled for everything else. That makes the objective something more of theoretical interest than practical utility.

Well, we still have more than one texture units to work with. I assume 4 texture units, and hopefully GPUs don't ever drop below that number.

What I do is put 4 to hundreds of textures (or normalmaps, heightmaps, etc) onto each texture (more or less "texture atlas" style). So my tcoords for a given object don't range from 0.000 to 1.000 on each axis, they range from some tiny fraction of that range. Of course my approach means I can't create repeating textures (for tiled floors and such) by letting the tcoords extend far < 0.000 and far > 1.000.

Clearly I need to rethink my balancing act. You guys are probably correct that only putting local-coordinates into the VBOs for "large moving objects" is not the optimal tradeoff. But some comments seem to indicate that going whole-hog the opposite direction isn't very good either.

In many games and simulations, the [vast/large/substantial] majority of objects are fixed. These objects probably render at the same speed with either local or world coordinates in the VBOs, everything else being done the same. Probably the simplest test I can perform is break my draw calls into as many objects are in each VBO and perform frustum tests on each. I can certainly measure how much CPU time that adds. Unfortunately, I'm not very proficient in figuring out the impact on GPU execution time.

This topic is closed to new replies.

Advertisement