Question for Terrain specialists...

Started by
17 comments, last by filousnt 15 years, 9 months ago
Hi, after much sweat and tears I finally got a completely unoptimized version of my clipmap terrain engine running. It is based on hoppe's clipmaps although I did some improvements/changes, especially with the texturing, splatting and multisampling. what I have to do (and have a few questions about) for optimizations is: -frustum culling Is there a good way to do this ? I have no idea how to i.e. match a box against the frustum to see if it is visible... -vertex block optimization I am using only single triangle lists and I think I can greatly reduce the number of vertice transformation if I change it to indexed strip lists. But I heard the gfx card buffers vertices anyway and my info is outdated too, so is this still advisable ? -Render optimizing I am currently occasionally doing a little work in the background with "StretchRect". Is this a big performance hit and should I use the renderpipeline for that as well (although I am certain "StretchRect" is HW-accelerated too). -Distance What should I put in the distance of the terrain ? Fog ? Or would it be better to just go all the way out with a really low LoD ? -Skybox I am currently not using a skybox, but is this still usable technology ? My days of directx programming are many years in the past and I think some things should have improved there. What is state of the art ? Maybe a sphere instead to make it smoother ? -FPS I am having a rather smaller terrain (1k x 1k) with a moderate level- and vertex-count at 800x800 while nothing else is going on with about 1700fps (8800gts,6ghz athlon xp). What should I aim for with the optimizations in place ?
Advertisement
Quote:Fog ?

I can't help much with most of your questions. For fog, however, if it's acceptable for the scene, you may want to use it.

You can do some quick culling of objects or terrain, avoiding drawing anything beyond "fogEnd" (whatever parameter you use to specify that).

In addition, with a little fog in the distance, you can "cover up" some low LOD terrain, if you decide to go that way.

Please don't PM me with questions. Post them in the forums for everyone's benefit, and I can embarrass myself publicly.

You don't forget how to play when you grow old; you grow old when you forget how to play.

Quote:Original post by Zaph-0
after much sweat and tears I finally got a completely unoptimized version of my clipmap terrain engine running. It is based on hoppe's clipmaps although I did some improvements/changes, especially with the texturing, splatting and multisampling.


Congrats [smile] I never made it as far as to actually implement it, so this is all from my limited understanding of the technique.

Quote:
-frustum culling
Is there a good way to do this ? I have no idea how to i.e. match a box against the frustum to see if it is visible...


IIRC, the improved GPU clipmap approach is to just render the entire terrain in one batch. If you're CPU bound, you probably shouldn't worry about frustum culling (at least not yet) and just set the GPU to work. This approach looks like a good AABB/Frustum, but Googling AABB frustum turns up a few more hits that may be useful.

Quote:
-vertex block optimization
I am using only single triangle lists and I think I can greatly reduce the number of vertice transformation if I change it to indexed strip lists. But I heard the gfx card buffers vertices anyway and my info is outdated too, so is this still advisable ?


Over here you can find some more information about vertex buffering/caching. That deals with indexed lists vs non-indexed strips though, so I'm not sure where multiple indexed strips would fit in. Intuitively I'd prefer a single list over multiple strips, since it would be more straightforward and uses only one draw call.

Quote:
-Render optimizing
I am currently occasionally doing a little work in the background with "StretchRect". Is this a big performance hit and should I use the renderpipeline for that as well (although I am certain "StretchRect" is HW-accelerated too).


It depends on what you're using it for. StretchRect is commonly used to read back data from a render target, which can make for a big performance hit. Essentially it stalls the pipeline, since everything has to be rendered before it can be read back, forcing the CPU and GPU to sync up. If you're using it on surfaces other than render targets, I wouldn't worry about it too much.

Got this wrong, see ET3D's reply below.

Quote:
-Distance
What should I put in the distance of the terrain ? Fog ? Or would it be better to just go all the way out with a really low LoD ?


I'd draw it as far out as useful and definitely use fog. In addition to masking the terrain cutoff, fog also goes a long way in faking aerial perspective, which will add a lot to the look of your scene. Make sure to use the same fog distances on the your objects too, otherwise it can look quite weird.

Quote:
-Skybox
I am currently not using a skybox, but is this still usable technology ? My days of directx programming are many years in the past and I think some things should have improved there. What is state of the art ? Maybe a sphere instead to make it smoother ?


I think skyboxes are still commonly used, probably because they work well as cubemaps for reflection in the bargain. For my simple projects I just used a 'skydome' though, which can be a bit easier to set up. Another (more complex & power hungry) approach which seems to be getting more and more popular is to do actual athmospheric scattering and also do clouds in realtime.

Quote:
-FPS
I am having a rather smaller terrain (1k x 1k) with a moderate level- and vertex-count at 800x800 while nothing else is going on with about 1700fps (8800gts,6ghz athlon xp). What should I aim for with the optimizations in place ?


That's very hard to say. I'd determine which level and vertex count works for your specific game/application and stick with that. 1700 fps sounds like a *very* good start, so just see how things work out when you start adding more and more elements. If the performance should ever drop below an acceptable level, you could always reduce the level and/or vertex count.

Hope this helps :)


[Edited by - remigius on July 17, 2008 8:27:15 AM]
Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!
Quote:Original post by remigius
Quote:
-vertex block optimization
I am using only single triangle lists and I think I can greatly reduce the number of vertice transformation if I change it to indexed strip lists. But I heard the gfx card buffers vertices anyway and my info is outdated too, so is this still advisable ?


Over here you can find some more information about vertex buffering/caching. That deals with indexed lists vs non-indexed strips though, so I'm not sure where multiple indexed strips would fit in. Intuitively I'd prefer a single list over multiple strips, since it would be more straightforward and uses only one draw call.
Don't use triangle strips, stick with indexed lists - they're just better. simple.

Shameless plug: An Example of Optimal Terrain Rendering (part two). I wrote up in my journal about utilizing the pre- and post-transform vertex caches using indexed lists and the results speak for themselves really - around 37 million triangles per second on an old GeForce 6800. I don't know how that compares to yours (crude multiples of your numbes would give 3.4 billion triangles per second, but I'm sceptical of that [lol])

Quote:Original post by remigius
Quote:
-Skybox
I am currently not using a skybox, but is this still usable technology ? My days of directx programming are many years in the past and I think some things should have improved there. What is state of the art ? Maybe a sphere instead to make it smoother ?


I think skyboxes are still commonly used, probably because they work well as cubemaps for reflection in the bargain. For my simple projects I just used a 'skydome' though, which can be a bit easier to set up. Another (more complex & power hungry) approach which seems to be getting more and more popular is to do actual athmospheric scattering and also do clouds in realtime.
I'd vote for atmospheric scattering as well.

I also saw a usage of SH/PRT for terrain that was jaw dropping when combined with day/night cycles for terrain. I was never a fan of SH/PRT due to its restrictions on static geometry, but for terrain thats not a problem unless you want deformations.

Quote:Original post by remigius
Quote:
-FPS
I am having a rather smaller terrain (1k x 1k) with a moderate level- and vertex-count at 800x800 while nothing else is going on with about 1700fps (8800gts,6ghz athlon xp). What should I aim for with the optimizations in place ?


That's very hard to say. I'd determine which level and vertex count works for your specific game/application and stick with that. 1700 fps sounds like a *very* good start, so just see how things work out when you start adding more and more elements. If the performance should ever drop below an acceptable level, you could always reduce the level and/or vertex count.
I'd hazard to say that your current performance metric is largely useless unfortunately. Frame rate is non-linear and over 1000fps starts to really degrade the quality of information.

Start measuring frame-time and then overload your system. Try and get it down to under a 100fps and then fire it through NVPerfHUD to try and analyse where your true bottlenecks are.


Oh, and my final recommendation - if you haven't already read Ysaneya's journal, do so now. An evening (or ten) reading through the archives of his planetary and terrain engines will be time well spent. There's some truly inspiring and clever tricks in there. I could only hope of being able to create something so impressive [grin]


hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Quote:Original post by remigius
Quote:Original post by Zaph-0
-Render optimizing
I am currently occasionally doing a little work in the background with "StretchRect". Is this a big performance hit and should I use the renderpipeline for that as well (although I am certain "StretchRect" is HW-accelerated too).


It depends on what you're using it for. StretchRect is commonly used to read back data from a render target, which can make for a big performance hit. Essentially it stalls the pipeline, since everything has to be rendered before it can be read back, forcing the CPU and GPU to sync up. If you're using it on surfaces other than render targets, I wouldn't worry about it too much.


StretchRect by itself can't be used to copy from the GPU to the CPU (source and dest must be in the default pool -- see the docs) and therefore shouldn't cause a CPU to GPU sync.

Theoretically StretchRect should be able to work faster than rendering (since it bypasses the pipeline), but I don't know how it's slotted into the command stream, so it's possible that it's slower. I'd suggest testing the options.
Quote:Original post by ET3D
StretchRect by itself can't be used to copy from the GPU to the CPU (source and dest must be in the default pool -- see the docs) and therefore shouldn't cause a CPU to GPU sync.

Theoretically StretchRect should be able to work faster than rendering (since it bypasses the pipeline), but I don't know how it's slotted into the command stream, so it's possible that it's slower. I'd suggest testing the options.


Sorry, my bad. I was so into Tom's FAQ about CPU/GPU syncing I mis-extrapolated [smile]
Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!

@Buckeye
Yes, you're right. With a good fog I can mask the LoD somewhat so I think I'll go that way.
@remigius
Thanks for the frustum links. I'll check them out and do some more research that way. What I found so far really blew up in my face (I am not so good on 3D math, my schools never covered that) but I hope I'll manage...

Do you have some information on that improved GPU clipmap approach where you render the entire terrain in one batch ?

So far I noticed the different adjustments to vertex shader variables (already consolidated all float and float2 into float4 for performance) and so on really cause a major performance hit. Sending it all out in 1 big chunk or maybe seperated per levels (for some dynamic level LoD and better blending between levels) should ease the statechange definitely but I am not so certain if I would lose too much in performance because of the display overhead.

I did a quick test and my FPS went to close to 4000 (!) with calls for single levels but I think have to see how each version (brute force render vs frustum culling) scales against each other...

Hoppe mentioned something along a factor of 2-3 performance gain by splitting and frustum culling but then he had old hardware...

The aerial perspective link is fantastic. Now I am definitely going with a fog and I think I will implement atmospheric scattering as well.

Thanks alot for your advice and if you have any information on that improved GPU clipmap approach please let me know...

^_^

@jollyjeffers

Your links look very interesting, especially the one to Ysaneya's Journal. Fantastic pictures. I will have to read all of it...

^_^

So far I am not even using indexed lists but only lists but I think that is one of the first things I will have to change.

I already tried looking around for SH/PRT and I think I will give it a go. Could you please give me a link to the usage of SH/PRT for terrain that you mentioned ?



Quote:-frustum culling
Is there a good way to do this ? I have no idea how to i.e. match a box against the frustum to see if it is visible...


Octtrees are the fastest way i know of for rendering large out door scenes.
An octree works in cubes, eight cubes to be exact. Initially, the octree starts with a
root node that has an axis-aligned cube surrounding the entire world, level, or
scene. Imagine an invisible cube around your whole world . A
node in an octree is an area defined by a cube, which references the polygons that
are inside of that cube. This is how we keep track of partitions. When we refer to a
cube’s minimum and maximum boundaries, we are indirectly talking about the
region of 3-D space that the polygons reside in.
This root node now stores all the polygons in the world. Currently, this wouldn’t do
us much good because it will draw the whole thing. We want to subdivide this node
into eight parts (hence the word octree). Once we subdivide, there should be eight
cubes inside the original root node’s cube. That means four cubes on top and four
on the bottom.
We have now divided the world into eight parts with just one subdivision

If a node is partially or fully inside this
space (in our viewing frustum), all of its associated polygons are drawn. We check
to see if a node intersects the frustum by its invisible cube that surrounds it. Instead
of checking whether each polygon is in the frustum, we just need to check the cube
that surrounds the polygons. This is where the speed is. The math for collision
between two boxes is easy and fast. Once we know that a node is in our view, we can
render it. One thing that hasn’t been mentioned is that the node must be an end
node. That means the node does not have any children nodes assigned to it. Only
end nodes hold polygonal data.

Theres a small intro.

This topic is closed to new replies.

Advertisement