Sign in to follow this  
Prozak

Whats the deal with Triangle Strips...

Recommended Posts

I know questions regarding Triangle Strips and frame rate gains are made often on GameDev, but I wanted the definite take on it all. I've implemented nVidia's NvTriStrip Library to convert my triangle list into a Triangle Strip. (btw, if you want me to write down a small tutorial on how to use this Lib, PM me). At home, on my Radeon 9600 XT, I got a small speed increase in the order of a 9% frames per second increase. At my work computer, which uses a very bland graphics card I got a speed increase of 34%, very noticeable. But, on my laptop I got a 40% speed decrease, a Compaq Presario, very non-game oriented I know. On my dad's laptop, the frame rate remained exactly the same... So, as you can see the results vary wildly, from a 40% speed loss, to a 34% speed gain... Is there any way for me to pre-detect if TriStrips will benefit the game on a certain machine? Is there a formula for this? Does a certain set of extensions have to be present to speed it up, make a diference? And on the latest CPU's, such as the 6800 and 7800 series, does triangle strips still have something to offer, to bring to the table? I know that triangle strips, due to it's format, allow a model's vertices to be transformed only once, and that due to their smaller footprint they travel from local memory to the GPU card faster, but I've also read somewhere that some cards, upon receiving the Triangle Strips expand it into a List, internally... So, what's the deal with Triangle Strips these days?...

Share this post


Link to post
Share on other sites
the increase lies in the reduces amount of data you send

thus you spare a few transformations here an there


in order to gain the largest performance increase you need to work with indexed triangle strips

that way the gfx card can transform the vertex buffer and doesn t have to apply one an the same transformation on duplicate vertices


a good striptification allows strips with approximately 1.1 -1.2 vertices per triangle


have a look at the tunneling algorithm, it allows you to connect several small strips to a large strip thus reduces the api overhead once more

I am working on a tri strip seminar for university so once its finished i can translate it to english and post it but this will take some time maybe half a year til i hold the seminar and i may not publish it before this date

Share this post


Link to post
Share on other sites
How many vertices were you benchmarking with and what were your fps before using tri-strips?

Also for that laptop ehich lossed speed does it use the default drivers? If so have you thought about upgrading the drivers? If has an NVidia or ATI gfx card look into the Omega drivers.

tri-strips simply reduce the amount of transfered data to the card which is obviously a benefit. I don't know anything about that lib you mentioned is it a pre-process? Doing it everyframe can't be good :)

Share this post


Link to post
Share on other sites
dmatter:
The gains and losses I posted above where all relative to the before and after framerate, before being Triangle Lists, and after being Triangle Strips.

My Presario Laptop has all the latest drivers, but it's graphics capabilities still suck.. :(

What are the Omega Drivers? Are those specially for laptops or something?

NvTriStrip is an nVidia Library specially designed to convert Triangle Lists into Triangle Strips. I coded a mesh post-load filter that (after any mesh is loaded, duh!) generates the Triangle Strips for that particular Mesh if it was loaded in Triangle List format, so, it's something that is done only at run time.

Share this post


Link to post
Share on other sites
It's worth noting that without indexing your triangle strips there's also a minor performance increase in that you're only processing one new vertex per triangle instead of three (the other two having been processed as previous triangles). So if you've got complex vertex shaders it's a win.

Share this post


Link to post
Share on other sites
prozak: are you setting the cache size correctly when you run nvtristip on the different machines? I daresay the performance drop on your laptop comes from nvtristrip trying to force things in to cache freindly order, even though there's no vertex cache.

superpig: I may be wrong, but I thought it only mattered whether the vertex was in the cache. If that's the case I don't understand how indexing / not indexing makes a difference. Care to enlighten me please? :¬)

Share this post


Link to post
Share on other sites
Superpig's right. The GPU's vertex cache could help for tri lists, but for stips they are only ever going to be processed once anyway. My last comment was more of a question for Basiror.

Share this post


Link to post
Share on other sites
The ideal geometry format is probably a strip ordered indexed list. The post TnL cache will make sure you get the benefits that strips provide. Lists will help make sure that you only have one draw call and don't need degenerate stitching.

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
It's worth noting that without indexing your triangle strips there's also a minor performance increase in that you're only processing one new vertex per triangle instead of three (the other two having been processed as previous triangles). So if you've got complex vertex shaders it's a win.


Exactly what do you mean by "Indexed"? And what is the diference between indexed and non-indexed?

I guess that instead of having a vertex pool, and Faces, one could just send the vertices over to the card using the same sequence as the TriStrip sequence...

Share this post


Link to post
Share on other sites
if you send a plain vertex array for your triangle strip you had to send some vertices twice (depends on your mesh) and really good striptifications have a lot of duplicate vertices and since the computer doesn t know which vertices are duplicates he has to perform the transformation for each duplicate which is quite a lot

i have some examples here on paper where they reduce the number of strips from 117 to 4 strips for a closed mesh
i think you should be able to imagine how many duplicates this means for complex meshes

and your indices could be unsigned shorts so the additional amount of memory bandwidth is actually neglectable

as i bought my pcie graphic card i was told that modern systems dont even use the bandwidth of agp4x and thats half a year ago so the number of transformations your graphic card has to perform should be the major bottleneck

of course you need to take care about the cache size, but optimization of tri strip for cache size is the forth of 4 optimization points


Share this post


Link to post
Share on other sites
Quote:
Original post by Prozak
Quote:
Original post by superpig
It's worth noting that without indexing your triangle strips there's also a minor performance increase in that you're only processing one new vertex per triangle instead of three (the other two having been processed as previous triangles). So if you've got complex vertex shaders it's a win.


Exactly what do you mean by "Indexed"? And what is the diference between indexed and non-indexed?

I guess that instead of having a vertex pool, and Faces, one could just send the vertices over to the card using the same sequence as the TriStrip sequence...


indexed strips are just the array index of the vertex in your vertex array

and today indexed strips are one of the fastest way to process geometry

Share this post


Link to post
Share on other sites
Aye. You want to index your tri strips because then you can splice together all strips that share material properties and render them in a single call, by inserting degenerates between strips. You can do that without indexing, but then you're wasting time running full vertex shaders on degenerate vertices; if you're indexing, then you can re-use end points / start points on strips for your degenerates and never waste time transforming them needlessly (because they've been transformed for an actual triangle).

Share this post


Link to post
Share on other sites
i index to reduce duplicates you could have 2 neighbouring strips with 3 rows of vertices
the center row for example would be transformed twice without indexing
usually, i don t know if your store the vertices you submit to the gfxcard in a list and perform a duplicate check but i think this would be pointless because indexing solves the issue and should perform much better

here a pdf about tri strips
http://wscg.zcu.cz/wscg2005/Papers_2005/Short/J29-full.pdf

Share this post


Link to post
Share on other sites
Thanks for explaining that Basiror. Kinda makes sense now. Indexing would definately reduce the amount of vertex data that needs to be sent to the GPU. But wouldn't the verticies shared by the two strips still need to be transformed twice? AFAIK the vertex cache is very small ~10 verticies.

Share this post


Link to post
Share on other sites
i don t think so

have a look here

http://developer.nvidia.com/object/GPU_D3D_Misconceptions.html


point 6.:
6. What is the vertex cache and should I care?
The GeForce 256 has a 10 vertex cache which lies after the transformation and lighting engines and before the setup engine. This means that data which has already been transformed and lit can be accessed more rapidly than data which needs to be fetched anew. The cache allows higher peak polygon throughput rates and is of most importance when the main load is on the lighting engine. The vertex cache only applies if you use indexed data.



these 10 vertices big cache is back from geforce 256 days

i dont think its still about 10 vertices that would be too less for modern gpus

Share this post


Link to post
Share on other sites
http://developer.nvidia.com/object/geforce3_faq.html
Says the GeForce3 has a 24 entry cache, but can't find any info on newer GPUs :(
For the vertex cache to help for two adjacent strips, you'd probably need a cache 1-2 orders of magnitude larger.

Share this post


Link to post
Share on other sites
In response to unanswered "What are Omega drivers?"

They're basically 3d-party hacked/tweaked video card drivers that might let you overclock a card, or might have tweaks to optimize some stuff. You never really know what all the tweaks might do though. I used them once when the official drivers had a flaw that crashed in a certain game at a certain point (rare situation) and went back to official drivers a short while later because I really didn't notice a speed increase.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this