• Advertisement
Sign in to follow this  

Overhead of Using Degenerate Triangles

This topic is 1767 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone,

 

I have been wondering how big the overhead of using degenerate triangles with indexed triangle lists is.

 

I've been asking around at NVIDIA DevZone, but not getting any reply:

 

https://devtalk.nvidia.com/default/topic/534530/general-graphics-programming/overhead-for-degenerate-triangles/?offset=2#3761674

 

I also saw that old GD thread, which was interesting, but did not give me a definite answer:

 

http://www.gamedev.net/topic/221281-render-cost-of-degenerate-triangles/

 

 

The situation is the following:

- Render a list of indexed triangles

- Do some fancy stuff in a vertex shader

- After the vertex shader, some vertices will be located at the same world position, meaning there will be some degenerate tris

 

 

My Question:

Assume I would know beforehand which triangles will become degenerate, and I could exclude them from rendering.

How big do you think would the speedup be in that case?

 

Please note that the indices of the corresponding joint vertices might still be different, so the GPU should not be able to discard triangles before the vertex processing stage, meaning that it still has to transform each vertex before finding out which triangle is degenerate. BTW, does the GPU realize this at all in that case? Does anyone have a reference where some GPU manufacturer has explained how the filtering of degenerate triangles works, and when it is applied?

 

Any help is appreciated. Thanks a lot in advance!

 

Best,

 

Volzotan

 

 

 

 

Share this post


Link to post
Share on other sites
Advertisement

Rendering is almost always fragment shader or ROP bound (or geometry shader bound, if there's enough output), and only very rarely vertex shader bound. Fragment shader work, as pointed out by mhagain, does not apply.

Therefore, you can pretty much consider the cost "zero".

This is true even more so as degenerate triangles reuse vertex indices of adjacent non-degenerate triangles. Which means they will be transformed, go into the post-transform-cache, and be reused. You transform them once, and you would need to transform them once anyway.

 

So the only real cost is 3 extra indices in the index buffer, which is neglegible both memory and bandwidth wise.

Edited by samoth

Share this post


Link to post
Share on other sites

Please note that the indices of the corresponding joint vertices might still be different, so the GPU should not be able to discard triangles before the vertex processing stage, meaning that it still has to transform each vertex before finding out which triangle is degenerate.

 

If you have an unusual situation where this is particularly common, then you might find you're better off using a triangle list instead of a triangle strip. Triangle lists can be optimized to make better use of the post-transform cache and can end up faster than strips even on fairly strip-friendly geometry, so if you have some special case which makes your geometry strip-unfriendly then I would imagine you'll get better performance with a triangle list. I'd echo the other posters though, that in the grand scheme of things it's unlikely to make much difference.

Share this post


Link to post
Share on other sites

Another point of view is accounting for wasted vertex processing. If every vertex, degenerate or important, gets the same moderate amount of processing (transforming, a few interpolations and texture lookups, etc.) adding x% useless vertices to the real geometry is a x% load increase, which up to a certain point is free.

Anything you do to avoid processing degenerate geometry needs to cost less than x% of clean geometry processing to have a chance of being useful; testing every triangle for degeneracy, even if the cost of rebuilding buffers could be avoided, appears quite out of the question.

Share this post


Link to post
Share on other sites

The main thing though is that it's a completely pointless exercise.  The GPU is already going to do this anyway, so all the proposal involves is repeating calculations that will already be done.  There may be a minor saving in bandwidth and vertex processing (although the degenerate verts are quite likely to already be in the cache anyway, so the latter saving is nowhere near as big a deal as one might thing), but at the expense of having to rebuild the vertex/index buffers.

Share this post


Link to post
Share on other sites

Hey,

thanks for your quick and detailed replies!

 

 

Now, about that theoretical gain, one huge disadvantage of this that now arises is that it's no longer possible for you to keep any model data in static vertex buffers.  Instead you're going to need to re-send all of your scene geometry to the GPU every frame.  Of course you could implement some caching schemes to avoid a resend, but that's more work and will result in uneven performance between the frames where you do send and the frames where you don't.

 

I'm not going to do such fancy things. Please excuse me for not being able to go into detail here, but let me clarify this a little bit:

 

The situation is simply that we know beforehand that, for our particular case, after a certain transformation in the vertex shader, we have - let's say - 50% of the triangles being degenerate, and at the same time being located at the back of our vertex and index buffers. So we could just ignore them in our drawcall, with actually zero overhead.

 

Of course, having the data re-organized this way (only once, during preprocessing!) instead of ordering it with a cache optimizer implies a certain overhead itself, as it potentially limits cache performance.

 

Please let us also assume that our application is vertex bound, e.g. because we have a large laser-scanned model, which is tesselated very regularly with many small triangles, and we use a moderately-sized viewport, instead of having a high-resolution viewport and optimized low-poly game models.

 

So, if I get you right, I can still expect a performance gain (-> vertex bound, 50% less vertex processing) by limiting my draw call to non-degenerate triangles, but in order to evaluate whether it's worth the effort, I have to compare my method with its re-organized data layout against a cache-optimized variant that renders all triangles and uses the GPU to discard degenerate ones, right? :-)

Edited by Volzotan

Share this post


Link to post
Share on other sites

OK, that makes more sense and sounds about right.  If it's purely a preprocessing step and everything is arranged accordingly, then yeah, you're going to get better performance under the constraints you mentioned.

 

Another option you may consider is to make some use of indexing.  In that case you retain the cache-optimized vertex buffer but keep two index buffers around, one just omitting indices for the degenerates.  There would be some extra memory overhead from having two index buffers, but it could be a worthwhile performance gain.

Share this post


Link to post
Share on other sites

Okay, here's a first result. Rendering with 77% of the triangles (by ignoring the last 23%, which are degenerate) gave me 110 fps instead of 90 fps (which I get if I render 100% of the triangles). So yes, there's indeed a speedup.

 

However, I still have to compare that particular example against the cache-optimized alternative mhagain also mentioned.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement