Followers 0

# Convert triangle list to triangle strip

## 17 posts in this topic

Hi, I hope this is the right forum for this question! I'm a little confused about this whole thing so I apologise in advance for any stupidity! The scenario is this: I'm writing a game engine and have been working on the mesh export tools for 3dsmax. Max provides mesh data in a triangle list format, and what I want to do is to turn the list into a triangle strip for dumping out to disk. So then my question is, where can I find out about this? Further info: I've been researching and experimenting with the NvTriStrip library as a possible solution and it's confused me somewhat. What's confusing me is what do the 2 main conversion functions actually do? So my question really is, can I use the NvTriStrip library to convert my triangle list to a triangle strip. All I need here is a reference. If anyone knows of a definitive reference on this then I'll buy it if necessary - I've tried searching the web with little luck :-( Also if anyone with experience of the NvTriStrip library could offer me a pointer then that would help too - I tried looking at the NVidia sample but the data is somewhat hidden in the D3D mesh structure with little to no documentation for me to reference. Also there is only very little documentation on the library. All I want to know is _what precisely does it do_? Ok I'd better go take a lye-down now I'm so exasperated ;-) Any and all help, comments, suggestions, links etc are encouraged!
0

##### Share on other sites
Out of interest, why do you want to do this? I was under the impression that indexed triangle lists were generally more efficient than [indexed] triangle strips anyway.
0

##### Share on other sites
Exactly - Indexed TriStrips can be 50% less efficient than TriLists, especially when rendering terrain, for little additional memory footprint (considering todays cards have 1GB of VRAM).
Just go for VRAM and youll save yourself troubles with degenerate vertices (that are necessay when connecting strips) and you can efficiently use the post-transform vertex cache.
0

##### Share on other sites
Basically there's only one function you need from NvTriStrip:

GenerateStrips(...)

You pass it only the indices of your vertices (extract the vertex indices form the 3DS faces) and the number of indeces you passed.

Additionally you provide a pointer to a PrimitiveGroup array as well as a pointer to an int that will receive the strips (PrimitiveGroups) and the number of strips.

If you enable stitching, i.e. you want the individal strips merged by adding degenerated triangles, you get only one PrimitiveGroup, i.e. one long strip.

NvTrisStrip also is able to remap/reorder the indices to improve the spatial locality of the vertices in order to better employ vertex caching for performance gains.

NvTriStrip does a good job but unfortunately it's slow. However, there is another library called TriStripper which is faster and almost as good (in terms of number of strips) but doesn't support stitching of triangle strips.

Hope that helps.

Edit:
Ah, I was to slow.

I have that impression, too (that indexed lists are faster than indexed strips). There are tools out there that reorder your vertices for more efficient use of post-transform vertex cache. IIRC NvTriStrip might also do that for triangle lists.
0

##### Share on other sites
Ah, thanks snk_kid. The first one was the article i read, too. Fortunately you remembered the link.
0

##### Share on other sites
Wow! Thank you very much guys, as usual your knowledge and support is positively priceless!

A special thank you to Lord_Evil for the heads up on the 'Tri Stripper' lib, and snk_kid for the article references!
0

##### Share on other sites
Quote:
 Original post by KalidorYou can try out ATI Tootle as well. I haven't personally used it yet but it's based on another paper, Triangle Order Optimization for Graphics Hardware Computation Culling (pdf), that I've heard good things about. Anyway, it's just another option. I recommend trying them all and see which works best for you.

Interesting, i assume this isn't ATI only? here is another paper 2007 which seems to extend the work: Fast Triangle Reordering for Vertex Locality and Reduced Overdraw.
0

##### Share on other sites
Don't do triangle strips!

When you would like to find it out, you will find out, that VCACHE optimized triangle lists are by far faster as any triangle strip will ever be.

This is the truth. You need to use D3DXMeshOptimizeInPlace to achive VCACHE optimized index lists.
0

##### Share on other sites
Thank you for all the article references! Excellent stuff, and I look forward to reading them all!

One thing is now utterly confusing me though. I can understand why indexed triangle lists are very fast, but not why they would be faster than a non-indexed triangle strip. This is because in the case of indexed triangle lists more data must be pushed down the AGP bus as opposed to non-indexed triangle strips. Is this perhaps because of an algorithm I dont know about that can exploit the extra data encoded in the indices of a triangle list? Surely a straight triangle strip without indices is the fastest technique of all simply because it requires the least amount of data to be pushed? (Less vertices & no indices).

Or is it the case that an unoptimised indexed triangle list is slower than an optimised triangle strip (Without indices) but that we can exploit the extra data encoded in the indices to manually optimise the mesh data (Eg. Reorganising indices to minimise overdraw and vertex caching).
0

##### Share on other sites
Quote:
 Original post by snk_kidInteresting, i assume this isn't ATI only? here is another paper 2007 which seems to extend the work: Fast Triangle Reordering for Vertex Locality and Reduced Overdraw.
I don't see why it would be ATI only but I haven't used it so I guess I can't say with 100% confidence, I don't know why they would do that though.

It does indeed appear that paper is an extension/update to the one I linked to, even written by the same 3 authors. Just from reading the abstract it seems like they were able to speed it up so significantly that it's feasible to run it at load-time or even whenever the mesh is altered. In Figure 1 it says with the old technique it took 40sec to run on the 40k triangle dragon mesh but with the new technique it took only 76ms with similar results. That's pretty damn impressive! I'll have to give a read through that paper and see what they're doing, such a drastic change could even mean it's a totally different technique that just happened to be developed by the same people.

Thanks for the link, you just ruined my Friday night! [grin]
0

##### Share on other sites
Quote:
 Original post by KalidorThanks for the link, you just ruined my Friday night! [grin]

I totally agree.

0

##### Share on other sites
Quote:
 Original post by TheGilbSurely a straight triangle strip without indices is the fastest technique of all simply because it requires the least amount of data to be pushed? (Less vertices & no indices).
Downstream bandwidth on AGP 8x is 2.1 GB/s, and 4 GB/s on PCI Express x16. The amount of data involved is not a big deal.

Indexed triangle lists are the most efficient when it comes to actually rendering, because it provides an opportunity to maximize the number of cache hits while processing vertices.
0

##### Share on other sites
Quote:
Original post by Kalidor
Quote:
 Original post by snk_kidInteresting, i assume this isn't ATI only? here is another paper 2007 which seems to extend the work: Fast Triangle Reordering for Vertex Locality and Reduced Overdraw.
I don't see why it would be ATI only but I haven't used it so I guess I can't say with 100% confidence, I don't know why they would do that though.

It does indeed appear that paper is an extension/update to the one I linked to, even written by the same 3 authors. Just from reading the abstract it seems like they were able to speed it up so significantly that it's feasible to run it at load-time or even whenever the mesh is altered. In Figure 1 it says with the old technique it took 40sec to run on the 40k triangle dragon mesh but with the new technique it took only 76ms with similar results. That's pretty damn impressive! I'll have to give a read through that paper and see what they're doing, such a drastic change could even mean it's a totally different technique that just happened to be developed by the same people.

Thanks for the link, you just ruined my Friday night! [grin]

Just thought I'd chime in here. I'm the primary developer on Tootle, and one of the 3 paper authors.

1. Tootle is GPU neutral. It uses D3D for overdraw measurement, but that should run on any GPU with occlusion query support. And if you run it as a preprocess (which is what we recommend, it can take a while) you can render the resulting meshes on any platform you want. At the moment we're just using D3DX directly for the vertex cache optimization, but there are plans to eventually include the methods from our SIGGRAPH 2007 paper.

2. Kalidor is right, the two papers are related, but very different. The SIGGRAPH 2007 paper presents a fast vertex cache algorithm which works just as well as any existing ones. It also includes a quicker, approximate algorithm for overdraw thats competitive with Tootle in some cases. In other cases, the methods used in Tootle will do a better job on overdraw, because Tootle bases its results on actual measurements whereas the SIG'07 method is a heuristic. So, if you dont care about running time, you'll probably want to stick with Tootle. The advantage of the SIGGRAPH method is that its fast enough to run at load time (or even on the fly at runtime if you like).

3. Ditto to what others say about strips, they're more trouble than they're worth. The only time when it really pays to use strips if you're somehow running on hardware that lacks a vertex cache (or has a really small one, e.g. 2 vertices).
0

##### Share on other sites
Why VCACHE is faster than tri-strip algorithm explained?

Each GPU has its own vertex cache (like cache on INTEL/AMD CPU's). And what's in cache is faster processed than outside vertices. Now the idea is, to load 16 or 24 vertices into that cache and use all the faces that use vertices in that group frist, and then load another 16 or 24 vertices in cache and use another set of faces.

You have to be careful. If you use 2 vertices in cache and 1 outside cahce you break the optimization. And that's why you can use triangle strips, because they might use more than 16 vertices in one call and would break the cache.

The cache access is by far faster that any reordering would ever be because it happens directly on the GPU without face/vertex index remaping.

The problem is, that you can not query, how big the vertex cache is. Every other graphic card might have different vertex cache size. But they say, better underestimate the size than overemestiate it, because if you put 24 vertices in 16 cache slots you can not use the speed boost.

So D3DX assumes that you have 16 VCACHE slots even if you might have 24 or 32 (NVidia series 6??? and above).
0

##### Share on other sites
Quote:
 Original post by jbarcz1Just thought I'd chime in here. I'm the primary developer on Tootle

Hello there, i just wanted to know if your work is related/based on Linear-Speed Vertex Cache Optimisation (ignoring the part which reduces overdraw) or is it something different altogether, if so how does Tootle's method compare?
0

##### Share on other sites
Quote:
Original post by snk_kid
Quote:
 Original post by jbarcz1Just thought I'd chime in here. I'm the primary developer on Tootle

Hello there, i just wanted to know if your work is related/based on Linear-Speed Vertex Cache Optimisation (ignoring the part which reduces overdraw) or is it something different altogether, if so how does Tootle's method compare?

At the moment, Tootle uses D3DX directly for VCache optimization. The SIGGRAPH 2007 paper (which we're planning to switch to) is a different algorithm than this one. Ours is designed to target a specific (FIFO) cache size, and its a bit more straightforward (fewer magic numbers). I dont know if we ever sat down and did an apples to apples comparison against this one. Our ACMR numbers are similar to the ones presented in that link, but without an exact comparison on a variety of meshes its hard to say whose is better. I dont know how their method would perform on larger meshes (where ACMR is more important). I also have no idea how our running time compares, since they didn't list running times.

I suspect that our method will do a little better in terms of ACMR for a known cache size, since we model a FIFO cache directly.

[Edited by - jbarcz1 on July 21, 2007 5:21:03 PM]
0

## Create an account

Register a new account