Sign in to follow this  
LPVOID_CH

Terrain: Triangles or TriangleStrips?

Recommended Posts

hi all i'm improving my terrain engine. i know that triangle strips are handled faster on the gpu than triangle lists. but i'cant represent my terrain with only one indexbuffer, if i use triangle strips. for each terrain-row i need a index buffer and have to call this buffer for each row in my programm. with triangle lists i can put all in one vertex/indexbuffer and call this buffer once for the terrain. does anyone has experience which of them could be faster? i couldn't find anything about that in the web. thanks

Share this post


Link to post
Share on other sites
You can use degenerate triangles that have two vertices with the same index to patch multiple strips together. Be sure to test if you actually get any speed up though. It's probably going to be quite small.

Share this post


Link to post
Share on other sites
ok, this would be an idea, but i don't like things like degenerated vertices. i'm porting my engine to XNA, so i don't know how XNA handles degenerated vertices.
thx for the approach

Share this post


Link to post
Share on other sites
Quote:
Original post by LPVOID_CH
i know that triangle strips are handled faster on the gpu than triangle lists.
Wrong. The best transform ratio (physically transformed vertices / triangles count) you can get with tri-strips is asymptotically near 1.00, whereas with tri-lists you could get the ratio down to 0,52.

If it`s not obvious by now, this means that you can make the card render 1000 tris with only 520 transforms by Vertex Shader.
Compare this to 1008 transforms for tri-strips or 3000 transforms for non-indexed tri-lists.

Of course, you must rearrange the vertices so that they stay in cache as long as possible so you can reuse as much of them as possible.

Then again, it`s easy to do for terrains, but not so easy to rearrange them for any regular 3D mesh (there are some utils from card vendors, though). However, these days, when you can easily push up to 10M tris per scene, it`s not really that much needed, unless you _REALLY_ want to push on screen as much vertices as possible. With basic LOD, you can get a really nice and dense terrain with long view distance easily under 1M tris, which is nothing these days.

You`ll hurt the performance MUCH more if you`ll start using more streams for your terrain. Try to aim for less than 3-4, even at the cost of duplicated data.

Share this post


Link to post
Share on other sites
Quote:

Wrong. The best transform ratio (physically transformed vertices / triangles count) you can get with tri-strips is asymptotically near 1.00, whereas with tri-lists you could get the ratio down to 0,52.


well, i always read in books/articles that tri-strips are faster than tri-lists. if this is true, i'll go ahead with tri-lists.
i don't use LOD because the LOD calculation on the CPU is slower than cashing whole terrain patches in the g-ram. my engine is also for FPS, so you can't look very distant like in a flight sim.

Quote:

For big terrain patches, you could use a Hilbert curve for this.


can you explain your aproach a bit more in detail or refer to an article about this?

Share this post


Link to post
Share on other sites
Quote:
Original post by joe1024
The best transform ratio (physically transformed vertices / triangles count) you can get with tri-strips is asymptotically near 1.00, whereas with tri-lists you could get the ratio down to 0,52.

How did you get this? I can understand that if the cache is large enough to hold vertices of one row, it won't make any difference which you use, strips or lists, you get about 0.5 . But if the cache is not that big, I'd really like to know how you can easily arrange a list to get a ratio of 0.52, no matter how big the grid.

Share this post


Link to post
Share on other sites
Quote:
Original post by LPVOID_CH
ok, this would be an idea, but i don't like things like degenerated vertices. i'm porting my engine to XNA, so i don't know how XNA handles degenerated vertices.
thx for the approach


It handles it very well. I've implemented a terrain engine in XNA that supports triangles lists and triangle strips without much problem. By the way I've profiled the engine and I've not found any noticeable difference between the two approaches. But i'm using only two vertex buffers and one index buffer for all the terrain...

Share this post


Link to post
Share on other sites
Quote:
well, i always read in books/articles that tri-strips are faster than tri-lists.
Of course. All literature for beginners should say that, beacuse otherwise they would have to start explaining it in more detail which would just utterly confuse/scare off the beginners. You can`t realistically expect to learn the advanced tricks from place which introduces you to basic concept of indexing, can you ?
Plus, not everybody knows of that. I too personally have seen many people not aware of this "trick", although they were quite competent otherwise.


Quote:
i'm porting my engine to XNA, so i don't know how XNA handles degenerated vertices.
What does that have to do with XNA ? You`re confusing API with basic 3D concepts. Check the underlying architecture of Xenos to see similarities.

Quote:
Original post by SnotBob
How did you get this? I can understand that if the cache is large enough to hold vertices of one row, it won't make any difference which you use, strips or lists, you get about 0.5 . But if the cache is not that big, I'd really like to know how you can easily arrange a list to get a ratio of 0.52, no matter how big the grid.
Personally, I`ve reached only the ratio of 0.56 with my own pattern. Only later, I`ve read here on gamedev one thread where one guy showed a way how to get the ratio of 0.52. However, I must admit I wasn`t bothered to try to implement his approach, because frankly, whether it is 0.52 or 0.56, it doesn`t matter at all. It`s still ~twice as much as compared to a no cache-friendly-indexing. 5% more or less - I couldn`t care less.

As to other part of the question - it doesn`t matter if the row fits to cache, since my indexing pattern isn`t based on rows at all - it`s of a completely different shape (no, not even a Hilbert curve), but I can easily traverse whole terrain chunk with it.

I should probably write a paper on it, since I haven`t seen it anywhere on the net. But since eventually a 4% faster approach was discovered, I couldn`t be bothered to devote some time the paper.

[Edited by - VladR on June 4, 2008 2:19:34 AM]

Share this post


Link to post
Share on other sites
Ok, I think I get it now. I wonder why I've not come across this before. Even a simple zig-zag pattern will get a ratio of about 0.75 . Seems to me that a simple, but reasonably good approach is to grab a suitably thick strip of Hilbert Curve -like curlies and lay the whole grid with those.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe1024
The best transform ratio (physically transformed vertices / triangles count) you can get with tri-strips is asymptotically near 1.00, whereas with tri-lists you could get the ratio down to 0,52.

That's just plain wrong. You're comparing a naive implementation of tri-strips to a highly optimised version of tri-lists. There's no reason you can't use many of the techniques, such as ordering triangles to maximise cache use and cache priming, that can make tri-lists fast with tri-strips. (See figure 3 in the original geometry clipmaps paper for one example.)

With the rate that GPU performance improves relative to memory, the effort of reading the indicies list might well become the limiting factor for highly optimised versions of both lists and strips.

Share this post


Link to post
Share on other sites
If you want to experiment, you could run the OptimizedMesh sample from the DirectX SDK. It lets you try various kinds of batch.

I just tested it on a GeForce 6200, and it achieved ~57M triangles per second with "list" and ~50M tris per second with "single strip" (remember to turn vsync off). I guess there could be some dodgy programming in OptimizedMesh that somehow favours lists, but on the face of it seems to be a reasonable test.

I would speculate that having an index array doesn't appreciably slow down modern video cards (as long as it's in video memory) and in fact leads to slightly better performance when used with triangle lists, because there's no need for degenerate triangles. You still need to optimize the mesh in order to take advantage of the vertex cache, of course.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
The absolute fastest setup for regular grid terrain is an indexed triangle list or strip (it really doesn't matter) with cache priming. End of story.

Absolutely - there are just too many other variables that make any slight differences between them meaningless. I just wanted to correct the impression that strips were automatically much worse that lists.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
The absolute fastest setup for regular grid terrain is an indexed triangle list or strip End of story.
But you can`t get better transform ratio than 1.01(or 1.05, doesn`t matter) with strips, now can you ? With ideally sorted indices, your tri-list could give you a ratio of ~0.52.
I don`t think that the card checks each current strip`s vertex whether it is in cache or not. It`s just vertex data for the card.

As for the size of Index buffer - I can see that you surely could find some pathological cases with very big IB that it could be the issue.

Share this post


Link to post
Share on other sites
Quote:
Original post by VladRBut you can`t get better transform ratio than 1.01(or 1.05, doesn`t matter) with strips


Depends on how many times the same vertex is used for different triangles. Yes, if the vertex is only used once, then theoretical lower bound on ACMR is 1. Whether the vertex has been transformed and is in the hardware FIFO (with its attributes) is tracked by the index of the vertex (and this can be determined very fast). That is why indexed strips/lists are so useful and both strips and lists can take advantage of indexed vertices.

At least this is how I understand it, please correct me if I misunderstood it :)

Share this post


Link to post
Share on other sites
Quote:
Original post by VladR
Quote:
Original post by Promit
The absolute fastest setup for regular grid terrain is an indexed triangle list or strip End of story.
But you can`t get better transform ratio than 1.01(or 1.05, doesn`t matter) with strips, now can you ?

Yes you can. See Hugues Hoppes paper for examples. In the case of regular grids for terrains it's even easier - see figure 3 of this paper.
Quote:
As for the size of Index buffer - I can see that you surely could find some pathological cases with very big IB that it could be the issue.

Strips have an almost factor of three advantage over lists for number of indicies, although that is reduced by degenerate triangles - which are necessary for cache optimised strips.

Share this post


Link to post
Share on other sites
Quote:
Original post by dave j
Strips have an almost factor of three advantage over lists for number of indicies, although that is reduced by degenerate triangles - which are necessary for cache optimised strips.


This is very true. But the size of the index buffer rarely matters and I think most people feel it's much easier to work with lists in their algorithms.

Share this post


Link to post
Share on other sites
Quote:
Original post by ndhb
This is very true. But the size of the index buffer rarely matters and I think most people feel it's much easier to work with lists in their algorithms.

Oh, I agree, most of the time it's more effort than it's worth. But there are some cases where it is relatively easy to define cache friendly indicies for strips - regular grids for terrains for instance which is where this thread started from.

Share this post


Link to post
Share on other sites
With the cache priming, each vertex in your terrain will be transformed only once ever. (Ideally. Since the cache has size limits, you'll be forced to repeat some edge vertices, but this is barely anything.) You can't really do any better than that.

Share this post


Link to post
Share on other sites
I think that the confusion here is due to the fact that some people are talking about indexed triangle strips, and others are talking about NON-indexed triangle strips. The 1.0 limit applies for the latter, not the former.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this