Sign in to follow this  
Volzotan

Overhead of Using Degenerate Triangles

Recommended Posts

Volzotan    106

Hi everyone,

 

I have been wondering how big the overhead of using degenerate triangles with indexed triangle lists is.

 

I've been asking around at NVIDIA DevZone, but not getting any reply:

 

https://devtalk.nvidia.com/default/topic/534530/general-graphics-programming/overhead-for-degenerate-triangles/?offset=2#3761674

 

I also saw that old GD thread, which was interesting, but did not give me a definite answer:

 

http://www.gamedev.net/topic/221281-render-cost-of-degenerate-triangles/

 

 

The situation is the following:

- Render a list of indexed triangles

- Do some fancy stuff in a vertex shader

- After the vertex shader, some vertices will be located at the same world position, meaning there will be some degenerate tris

 

 

My Question:

Assume I would know beforehand which triangles will become degenerate, and I could exclude them from rendering.

How big do you think would the speedup be in that case?

 

Please note that the indices of the corresponding joint vertices might still be different, so the GPU should not be able to discard triangles before the vertex processing stage, meaning that it still has to transform each vertex before finding out which triangle is degenerate. BTW, does the GPU realize this at all in that case? Does anyone have a reference where some GPU manufacturer has explained how the filtering of degenerate triangles works, and when it is applied?

 

Any help is appreciated. Thanks a lot in advance!

 

Best,

 

Volzotan

 

 

 

 

Share this post


Link to post
Share on other sites

Rendering is almost always fragment shader or ROP bound (or geometry shader bound, if there's enough output), and only very rarely vertex shader bound. Fragment shader work, as pointed out by mhagain, does not apply.

Therefore, you can pretty much consider the cost "zero".

This is true even more so as degenerate triangles reuse vertex indices of adjacent non-degenerate triangles. Which means they will be transformed, go into the post-transform-cache, and be reused. You transform them once, and you would need to transform them once anyway.

 

So the only real cost is 3 extra indices in the index buffer, which is neglegible both memory and bandwidth wise.

Edited by samoth

Share this post


Link to post
Share on other sites
C0lumbo    4411

Please note that the indices of the corresponding joint vertices might still be different, so the GPU should not be able to discard triangles before the vertex processing stage, meaning that it still has to transform each vertex before finding out which triangle is degenerate.

 

If you have an unusual situation where this is particularly common, then you might find you're better off using a triangle list instead of a triangle strip. Triangle lists can be optimized to make better use of the post-transform cache and can end up faster than strips even on fairly strip-friendly geometry, so if you have some special case which makes your geometry strip-unfriendly then I would imagine you'll get better performance with a triangle list. I'd echo the other posters though, that in the grand scheme of things it's unlikely to make much difference.

Share this post


Link to post
Share on other sites
LorenzoGatti    4442

Another point of view is accounting for wasted vertex processing. If every vertex, degenerate or important, gets the same moderate amount of processing (transforming, a few interpolations and texture lookups, etc.) adding x% useless vertices to the real geometry is a x% load increase, which up to a certain point is free.

Anything you do to avoid processing degenerate geometry needs to cost less than x% of clean geometry processing to have a chance of being useful; testing every triangle for degeneracy, even if the cost of rebuilding buffers could be avoided, appears quite out of the question.

Share this post


Link to post
Share on other sites
mhagain    13430

The main thing though is that it's a completely pointless exercise.  The GPU is already going to do this anyway, so all the proposal involves is repeating calculations that will already be done.  There may be a minor saving in bandwidth and vertex processing (although the degenerate verts are quite likely to already be in the cache anyway, so the latter saving is nowhere near as big a deal as one might thing), but at the expense of having to rebuild the vertex/index buffers.

Share this post


Link to post
Share on other sites
Volzotan    106

Hey,

thanks for your quick and detailed replies!

 

 

Now, about that theoretical gain, one huge disadvantage of this that now arises is that it's no longer possible for you to keep any model data in static vertex buffers.  Instead you're going to need to re-send all of your scene geometry to the GPU every frame.  Of course you could implement some caching schemes to avoid a resend, but that's more work and will result in uneven performance between the frames where you do send and the frames where you don't.

 

I'm not going to do such fancy things. Please excuse me for not being able to go into detail here, but let me clarify this a little bit:

 

The situation is simply that we know beforehand that, for our particular case, after a certain transformation in the vertex shader, we have - let's say - 50% of the triangles being degenerate, and at the same time being located at the back of our vertex and index buffers. So we could just ignore them in our drawcall, with actually zero overhead.

 

Of course, having the data re-organized this way (only once, during preprocessing!) instead of ordering it with a cache optimizer implies a certain overhead itself, as it potentially limits cache performance.

 

Please let us also assume that our application is vertex bound, e.g. because we have a large laser-scanned model, which is tesselated very regularly with many small triangles, and we use a moderately-sized viewport, instead of having a high-resolution viewport and optimized low-poly game models.

 

So, if I get you right, I can still expect a performance gain (-> vertex bound, 50% less vertex processing) by limiting my draw call to non-degenerate triangles, but in order to evaluate whether it's worth the effort, I have to compare my method with its re-organized data layout against a cache-optimized variant that renders all triangles and uses the GPU to discard degenerate ones, right? :-)

Edited by Volzotan

Share this post


Link to post
Share on other sites
mhagain    13430

OK, that makes more sense and sounds about right.  If it's purely a preprocessing step and everything is arranged accordingly, then yeah, you're going to get better performance under the constraints you mentioned.

 

Another option you may consider is to make some use of indexing.  In that case you retain the cache-optimized vertex buffer but keep two index buffers around, one just omitting indices for the degenerates.  There would be some extra memory overhead from having two index buffers, but it could be a worthwhile performance gain.

Share this post


Link to post
Share on other sites
Volzotan    106

Okay, here's a first result. Rendering with 77% of the triangles (by ignoring the last 23%, which are degenerate) gave me 110 fps instead of 90 fps (which I get if I render 100% of the triangles). So yes, there's indeed a speedup.

 

However, I still have to compare that particular example against the cache-optimized alternative mhagain also mentioned.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this