Sign in to follow this  

Draw 2 faces vs CULL_NONE speed

This topic is 3597 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm instancing trees (up to several thousand) using a vertex shader (ID3DXEffect), DrawIndexedPrimitive and an array of positions passed to the shader. Is it faster to have a VB with 2 faces back-to-back with opposing normals and let the pipeline cull the backface, or set up the VB with 1 face (and a normal in either direction) and cull none? I'm afraid I don't understand the pipeline well enough to reason this one out. The shader does the lighting. It can be setup for either case.

Share this post


Link to post
Share on other sites
Have you tested it?

Intuitively, I would say the non-culling option is faster, and it would save some GPU memory (might be marginal depending on the complexity). Although only one of the sides of a polygon should be visible at any given time, you might introduce z-fighting with surfaces at close-to-zero distance. And having this configuration of two submeshes could make the use of shaders or any advanced effect more difficult later on as the composition of vertices is less than straightforward.

Share this post


Link to post
Share on other sites
Thanks for the reply.

I've made a couple crude timing tests. It appears the non-culling method is faster.

My thought is "The fastest triangles are the ones not drawn" and, having thought about it a little more, it seems a no-brainer. With the non-culling method, only 1 face enters the pipeline (as you say, less GPU mem) and always gets drawn. With the culling method, 2 faces enter the pipeline and one gets culled.

(Actually, the VB I'm using is 2 non-culled faces, one perpendicular to the x axis, the other perp to z, but the argument applies to each face).

If it is true that with D3DCULL_NONE, no cull test is performed, then there's the additional savings of 1 cull test per triangle.

It also (just) occurred to me that, with the shader reversing the vertex normal (if necessary) to oppose the view direction (for lighting), the only extra GPU processing is ensuring the vertex normal faces the view direction and reversing it if dot(view_dir,normal) > 0. Seems like that must be much faster than a cull test.

z-fighting doesn't appear to be a problem (nothing I've been able to see), probably due to the fact that when drawing two back-to-back faces (with opposing normals), one gets culled unless the view angle is *exactly* perpendicular to the normals. However, having only 2 perpendicular faces solves the problem.

Quote:
Original post by WanMaster
Have you tested it?

Intuitively, I would say the non-culling option is faster, and it would save some GPU memory (might be marginal depending on the complexity). Although only one of the sides of a polygon should be visible at any given time, you might introduce z-fighting with surfaces at close-to-zero distance. And having this configuration of two submeshes could make the use of shaders or any advanced effect more difficult later on as the composition of vertices is less than straightforward.


Share this post


Link to post
Share on other sites
I would, as you've also observed, imagine the non-culling approach to be far better.

On current and future GPU's the memory bandwidth is likely to become the likely stalling point. That is, ALU power is increasing faster than available bandwidth such that adding a few extra instructions (or a simple branch on the sign of the dot product) is likely to be inconsequential compared with effectively doubling the amount of geometry information being sent down the pipe.

Have you searched around for "two sided lighting" yet? It was quite popular in the early-days of shaders, so there may be some good whitepapers lying around that discuss this in more detail...

hth
Jack

Share this post


Link to post
Share on other sites
Thanks for the further comments, particularly on ALU capabilities increasing. An excellent consideration.

In addition, your mention of "double" the geometry made me realize that my description of 1 vs. 2 faces entering the pipeline is better stated as 1000 vs. 2000 faces (for each of 50-100 terrain sectors in my case!) entering the pipeline. At even a microsecond per triangle, that's a few MILLIseconds per frame! Valuable time, indeed!

I did look into some two sided lighting papers (you're correct about being "early" solutions). They implied or stated that twice the geometry was required and (appeared to be) applied to curved surfaces where both sides of a surface may be visible at any particular time. One paper mentioned a cull-face register, implying the savings in time was in not performing a cull test(!) but still providing proper lighting.

I'm not an experienced shader programmer by any stretch of the imagination so my assumptions may be completely wrong. With GPU advancements, twice the geometry may no longer be required.

If my assumptions are correct, with my simplistic tree VB comprised of flat geometry, only 1 face of each would be visible at a time and the need for two-sided lighting may be moot.

Share this post


Link to post
Share on other sites

This topic is 3597 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this