• Advertisement
Sign in to follow this  

Rendering vertices for my Terrain

This topic is 4412 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey All. Im trying to do a 2d terrain which will be viewable from plan view. So I started it and all was looking good. Ive got my vertices in position. And here is how my 2D terrain vertices are positioned(its only 4x4): - - - - - - - - - - - - - - - - - - - - - - - - - Now i know i could use a Triangle Strip 4 times to render my terrain but i was hoping for a better approuch, as i wouldnt call a bunch of DrawPrimitive functions for a 3d terrain. Any ideas.

Share this post


Link to post
Share on other sites
Advertisement
Concatenating triangle strips is exactly what you want.
Gamedev has an article:
http://www.gamedev.net/reference/articles/article1871.asp

I would also note this forum posting regarding some possible altercations with the article, but the idea is exact.

http://www.gamedev.net/community/forums/topic.asp?key=featart&uid=1871&forum_id=35&Topic_Title=Concatenating+Triangle+Strips

^sf.

Share this post


Link to post
Share on other sites
Actually, a triangle strip is a loss on hardware with a transformed vertex cache.

With a triangle strip, the best you can get is asymptotically close to one triangle per transformed vertex. With a triangle list (not strip), sorted for the vertex cache, you can get about 1.5 triangles per transformed vertex on a reular mesh like this.

Thus, the list will outperform the strip.

Note that you don't need one index list per terrain block, only one index list per terrain block size -- all, say, 32x32 terrain blocks will use the same index list (unless you do LOD through border decimation, in which case you'll need 15 lists total, for the different LOD combinations).

Share this post


Link to post
Share on other sites
I read the first link and not sure whether i understand it. Please correct me if im wrong.

Say these are my vertices for the terrain terrain:

01 02 03 04 05
06 07 08 09 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25

In my vertex buffer would i do the following:

06,01,07,02,08,03,09,04,10,05,11,06,11,06,12,07,13,08,14,09,15,10,16,
11,16,11,17,12,18,13,19,14,20,15,21,16,21,16,22,17,23,18,24,19,25,20

then a call to DrawPrimitive as follows:

DrawPrimitive(D3DPT_TRIANGLESTRIP,0,however many pollygons);

Share this post


Link to post
Share on other sites
The post is actually more correct than the article.

The order, as modified by the post, would be:

06,01,07,02,08,03,09,04,10,05, 05,11, 11,06,12,07 ... and so on

I confirmed this by writting a quick program using that order using an index buffer.

[Edited by - skillfreak on January 21, 2006 10:43:31 PM]

Share this post


Link to post
Share on other sites
Someone came up with a trick called "Priming the vertex cache" which can help get larger terrain tiles while keeping more data cached.

Now lets say we're doing a 16x16 tile. The vertices (hex numbering) are like so:

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
...
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

Typically the indices would be (in list form)
00,01,10, 10,01,11, 01,02,11, 11,02,12, 02,03,12, 12,03,13, etc...

Note that by the time we start the second row of triangles, we've looked at 32 vertices (16 on top and 16 on bottom), and the 10,11,12,etc, vertices which we need to begin the next row are long since out of the cache.

Instead, we begin with a bunch of degenerates
00,01,01, 02,03,03, 04,05,05, 06,07,07,...,0E,0F,0F

Now row 0 is in the cache, in order, with nothing drawn. Now draw row after row as normal, and the vertices removed from the cache FIFO will always be ones you'll never be needing again.

Share this post


Link to post
Share on other sites
Quote:
Original post by Namethatnobodyelsetook
Instead, we begin with a bunch of degenerates
00,01,01, 02,03,03, 04,05,05, 06,07,07,...,0E,0F,0F

Now row 0 is in the cache, in order, with nothing drawn. Now draw row after row as normal, and the vertices removed from the cache FIFO will always be ones you'll never be needing again.

I am trying to optimize my index buffer as well, but how would this help, it seems as soon as you get to drawing row 0 primitives, you would once again be out of cache.

Quote from another thread:
Quote:
Original post by JohnBolton
As far as being vertex cache friendly... One way is to snake your way through the grid. Like this (here each number is a quad, not a vertex):

0 -> 1 -> 2 -> 3
)
4 <- 5 <- 6 <- 7
(
8 -> 9 -> 10 -> 11
)
12 <- 13 <- 14 <- 15


Is your way better/can you explain it a little bit better? do you draw degenerate triangles for every row?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Using triangle lists instead of strips (to allow more convolutions to increase vertex caching) is something I will have to go back and retest (might be different now also with changes in newer cards...).

So the index data stream (3 indexes fetched instead of 1 per triang) is apparently much less a load than the gain of having a few more points remain in the cache (even with the larger vertex caches the newer cards have...). Eliminating the degenerate triangles on each row cant hurt either.


Im not sure about the 'cache priming' that was mentioned. Seems to me you would only gain on the first row (and not as much for cards with short vertex caches). If you have to wait for the caching row to get doe anyway, is there that much of a gain?? Well one more thing to test to see what effect it has.



Anyone have statistics on the vertex cache sizes on the range of cards today. Any idea how much larger they are expected to get?? And whether the multiple vertex pipes might screw it up (if each has its own cache...)

Share this post


Link to post
Share on other sites
Right ive tried triangle lists instead of triangle strip and it all works great i have a grid of 4 tiles bang on in the center thanx for everything guys

Share this post


Link to post
Share on other sites
Quote:
Original post by Valor Knight
I am trying to optimize my index buffer as well, but how would this help, it seems as soon as you get to drawing row 0 primitives, you would once again be out of cache.


Nope. The problem with the first row of triangles is that it mixed vertices from the first and second row of vertices. If the cache is large enough to hold 1 row of vertices, but not both, you can use this trick. It's also optimal for pre-T&L caching, not just post T&L cache.


Imagine the cache holds 18 entries (it's somewhere around there). After we do the degenerates, your cache will have:

00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F,empty,empty

Now lets draw row 1 of triangles, We'll need new indices along the way.

00,01,10, 10,01,11, 01,02,11, 11,02,12, 02,03,12, 12,03,13

10, needed
00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F,10,empty
then 11
00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F,10,11
then 12
01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F,10,11,12
then 13
02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F,10,11,12,13

notice how we reuse the vertices before they exit the fifo, so we're just tacking new sequential numbers on the end. When we go to draw row 2 of triangles, one row of vertices is sitting in the cache. This is exactly the same pattern as we used to prime the cache in the first place.

The GPU never transforms a vertex more than once, and it reads them in order. It's the most optimal thing you could ever hope for.

Share this post


Link to post
Share on other sites
Amazing. I can't wait to try this in my application.

Thanks for the great tip! It's so easy to do, too.

Share this post


Link to post
Share on other sites
Quote:
Original post by Namethatnobodyelsetook
Imagine the cache holds 18 entries (it's somewhere around there). After we do the degenerates, your cache will have:


So If I was using "Patches" of more than 18 vertices/row, this method would be useless? Currently I am using 65x65 and 129x129 patches, by the time I am done with one row then the vertex cache should be gone.. So, your method only works with ~18x18 patches?

Share this post


Link to post
Share on other sites
Quote:
Original post by Valor Knight
Quote:
Original post by Namethatnobodyelsetook
Imagine the cache holds 18 entries (it's somewhere around there). After we do the degenerates, your cache will have:


So If I was using "Patches" of more than 18 vertices/row, this method would be useless? Currently I am using 65x65 and 129x129 patches, by the time I am done with one row then the vertex cache should be gone.. So, your method only works with ~18x18 patches?

The 16x16 grid used by Namethatnobodyelsetook was just an example and for that grid, the vertex cache must hold at least 18 vertexes for the techinque to work. In general, the vertex cache must hold at least 2 more than the number of vertexes/row in order for this technique to work.

The size of the vertex cache is typically very small, so a 65x65 or larger will not benefit from this technique. Here is some info from NVidia:
Quote:
From QuerySample User Guide
All current NVIDIA GPUs ... have a PostTnL cache that is a strict FIFO. The cache size is 16 vertices for GeForce 1, GeForce 2, and GeForce 4 MX architectures and 24 vertices in all GeForce 3, GeForce 4, GeForceFX, and GeForce 6 architectures.


Quote:
Original post by hplus0603
With a triangle strip, the best you can get is asymptotically close to one triangle per transformed vertex. With a triangle list (not strip), sorted for the vertex cache, you can get about 1.5 triangles per transformed vertex on a reular mesh like this.

hplus0603, can you give an example?

[Edited by - JohnBolton on January 22, 2006 3:03:55 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by JohnBolton
The size of the vertex cache is typically very small, so a 65x65 or larger will not benefit from this technique.
Of course it will, assuming you make a very simple tweak to your index generation. That's 64 polys on a side, so normally you'd do one strip of the terrain at a time, 64 polys long. Instead, slice the patch into strips, each of which is only 16 long, and do 4 of those. Prime each strip at the beginning, and you're gold.

Share this post


Link to post
Share on other sites
So, to take advantage of using 16 vertices per row, you need an 19 vertex cache? (17 you need 20). If you had a 16 vertex cache, the new vertex you need for triangle one would overwite vertex 0, thus having to write vertex 0 again for the second triangle in the quad. Do you need to render the degenerates as well? (I know they are not visible, but should they be added in for offsets thus for a 17x17, instead of 512 triangles, you need to allow for 521?)

Share this post


Link to post
Share on other sites
Quote:
Original post by Valor Knight
So, to take advantage of using 16 vertices per row, you need an 19 vertex cache? (17 you need 20). If you had a 16 vertex cache, the new vertex you need for triangle one would overwite vertex 0, thus having to write vertex 0 again for the second triangle in the quad.
Pretty much. This will do no better than the naive algorithm on cards which have a 16 vertex cache, which is GF3 and older. On the GF4 and newer, and ATI cards (dunno any specific version info), the caches are 24 large or so, so it should be fine.
Quote:
Do you need to render the degenerates as well? (I know they are not visible, but should they be added in for offsets thus for a 17x17, instead of 512 triangles, you need to allow for 521?)
Yes. I just finished writing this in, including LoD. What a freaking pain. The way I wrote it only takes 518 triangles (6 extra) in the 17x17 case. I render all 17 edge verts in a row, then do the last one again so it's divisible by 3 (i.e. no effect on the rest of the rendering).

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Quote:
Do you need to render the degenerates as well? (I know they are not visible, but should they be added in for offsets thus for a 17x17, instead of 512 triangles, you need to allow for 521?)
Yes. I just finished writing this in, including LoD. What a freaking pain. The way I wrote it only takes 518 triangles (6 extra) in the 17x17 case. I render all 17 edge verts in a row, then do the last one again so it's divisible by 3 (i.e. no effect on the rest of the rendering).

so you render 0,1,2 |3,4,5 |6,7,8 |9,10,11 |12,13,14 |15,16,16 as the degenerates? I am currently writing this into mine (for lod 0 currently).I think I will post my lod 0 code when I'm done to make sure I am on track, now that I get it).

Just curious, but how do you handle stitching the corners of your patches to the adjacent patches (of diffrent lod's), currently mine is bulky and looks like there should be a better way. (I check to see if I am reading row and column 0 and then if it needs to adapt - a lot of if statements)

Share this post


Link to post
Share on other sites
Surely you only need a cache 1 bigger than your row length? It's pushing it, but I think it should work.

Lets do a little row size of 8, and say we have a cache size of 9. So we do the degenerates and get this:

0 1 2 3 4 5 6 7 nothing

Then if the first triangle is 0, 1, 8 we get:

0 1 2 3 4 5 6 7 8

Then the next is 1, 9, 8 and 0 is pushed out of the cache, but we just finished with that so it's ok. I'm not sure if it's a good idea in practice, but theoretically it's ok right?

In fact, if the gpu was really predictable (hah) and did the vertices in the order you gave, you might be able to get away with a cache the same size as your row. when you draw the first triangle, 0, 1, 8, it pushes 0 out when it get's to 8. Then the second triangle pushes 1 out when it gets to 9 and so on. But I doubt that would work.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement