Thoughts about rendering terrain with shaders...

Started by
5 comments, last by thona 21 years, 4 months ago
Working on my terrain renderer I got the idea that I am doing something VERY inedfficient here - and have a much better idea. No clue how this works in practice, though. Anyone has any comments? Especially on the not too easy shader operations going on with THIS: I am working on a terrain renderer in Dx 9 with C# as a tech demo :-) I try to follow your Charles Bloom''s ideas for splatting terrain, and have come up with some additions - maybe you want to comment on this.My focus is current/next generation cards, so I try to find the best thing running with DirectX 9. I came up with the following ideas: (a) I want to minimuze memory use on the card, and transfers to the card. What about using the SAME vertex buffer over and over (and the same index buffer, if you go indexed). Basically when drawing multiple "patches" of terrain (chunks as he calls them), different matrizes can be used to position the patch in worls spact. This has the ability that basically you run with only and exactly ONE vertex/index buffer being reused over and over. Without vertex shaders whis would be one vertex buffer (no, one set of data on a shared vertex buffer) for every chunk (different height information). Still less memory. (b) maybe the heightmap could be stored as a texture. As a result, the heightmap can be decomposed through a vertex shader. Given X and Y values, the proper texture could be taken to get the Z-value - means: only ONE vertex set, reused over and over. This would NOT work with displacement mapping - the vertices would already exist. (c) the Texture could also be taken from the Heightmap :-) If you assume that every vertex has ONE texture type that it belongs to ("native texture"), then the heightmap could be in a format that allows Z-Values and terrain type to be stored, and a vertex shader would be able to decide (possible with the latest versions) given on a constant whether alpha should be one or zero :-) With two steps (additional texture stores) this could be done a little "more granular". Anyhow, the idea is that the vertex shader determines the alpha value. As a result, you would have: (a) only one vertex buffer block, with one index block - they never change. (b) use of a texture as a combined heightmap/texture decision map. (c) as a result way less operations to perform to get the alpha maps etc., AND way less memory use on the graphics card (lets say down to 4 bytes per vertex - two for the heightmap, two for two one byte texture index). This would be ONLY 16.5k for a 65*65 patch - HARD to beat. (d) you have pretty massive overdraw - basically I go and render the patch once for every possible texture, and I talk of the WHOLE patch. No optimisations (possible - the routine calling DrawPrimitive has NO idea what data is in the heightmap :-) (e) require a current card, which I would be willing to accept as a condition- The texture use could be optimised in that basically you use one large texture for the heightmap, and put data in there as you go - instead of manipulating the vertex and index buffers. As a result, you also have no texture swap at this stage. Furthermore, using an additional one dimenssional texture (lookup value) every possible texture could be assigned a "color" which could be used to: (a) render the "base texture" that he mentions, and (b) render the more distant patches, not applying the detail textures. I mean, this would basically offload MOST of the work to the graphics card AND save a ton of memory there. Just for some (ok, quite some) vertex shader oeprations. Comments on this? I dont want to start trying this out if this is a dead end. Regards Thomas Tomiczek THONA Consulting Ltd. (Microsoft MVP C#/.NET)
RegardsThomas TomiczekTHONA Consulting Ltd.(Microsoft MVP C#/.NET)
Advertisement
I was thinking of something very similar. Currently I have a terrain engine that works with 1024x1204 terrains using 32x32 patches that are frustum culled and rendered as tri-strips brute force. It is quite fast as I can achieve 200-300fps on a gForce3/P4 1.7 when rendering at ground level (meaning that 15-40 32x32 patches are drawn, worst case frame rate drops to about 60-70fps with 500+ patches). It is all multi-textured too, but it is using a DirectX 7 approach for almost everything.

So...I''m a newbie with Shaders, but was thinking along the same lines, doing something like this.

1) Create a patch class that contains both a World and a Texture matrix and a bounding box for that terrain patch. (this may require some precomputation to get the bbox). Store these in a quadtree based on their bboxes).

2) Set up each patch with a scale (x,z) and transform matrix to move it into proper position.

3) Set up a Texture matrix to translate each patch to the proper place on a texture map. (this I''m not so sure of, cuz you might just need to provide a lookup with an offset in the Shader, remember newbie here and not sure if that can be done.)

3) Create a big 1024x1024 "terrain" texture using using either a 16bit or 32bit heighmap (allows for more variance).

4) Create an tri-strip index buffer and 33x33 vertex buffer for rendering a 33x33 patch. (note the overdraw to 33x33 to tie the terrain together)

5) Create a shader that can use a lookup table or texture matrix(heighmap texture) to find y value to stuff into the world matrix per vertex and do all the other things needed, like the multi-texturing etc.)

Now for rendering...

1) Set the stream sources for index and vertex buffers.

2) Walk the quadtree and find visible patches based on view frustum.

3) For each visible patch, swap in World and Texture (if needed) matrix for patch and render. (here you could also update values in the shader to draw detail textures based on distance or other things.)

Seems this could work quite well. When I get some time I think I might just have to code it up and see if it would work. Still, there is no LOD optimization happening so this is good for ground level viewed terrains, but not for something like a flight sim where you need longer distances.

Some other last minute thoughts are that you could probably get away with not even using a World matrix, but use use x and z scaling and x and z translation and update those values within the shader directly.




My comments with *** - quote tags are too much for me today :-)


Currently I have a terrain engine that works with 1024x1204 terrains using 32x32 patches that are frustum culled and rendered as tri-strips brute force.

*** I thought that 32x32 was too small for today''s hardware :-)

It is quite fast as I can achieve 200-300fps on a gForce3/P4 1.7 when rendering at ground level (meaning that 15-40 32x32 patches are drawn, worst case frame rate drops to about 60-70fps with 500+ patches).

*** Thats something I will never test. I orgnanize the renderer a little more. I currenly render a maximum of 25 patches (5x5) and think of going to 7x7 - anything above this will be handled by ay separate renderer being a little simpler :-) Surely, out of the 25 patch, as you stand in the middle, you normally only see a fraction :-)

It is all multi-textured too, but it is using a DirectX 7 approach for almost everything.

*** Well, I use Dx 9 in C# - so thats a difference. I have a buffer class that I currently use to buffer the vertex information of the last (I render max. 25) 35 patches in a LRU fashion. I apply ONE texture right now - and start thinking how to make MORE textures.

So...I''m a newbie with Shaders, but was thinking along the same lines, doing something like this.

1) Create a patch class that contains both a World and a Texture matrix and a bounding box for that terrain patch. (this may require some precomputation to get the bbox). Store these in a quadtree based on their bboxes).

*** Well, what for :-) Why do you actually need a bounding box at all. I go the other way: I have all patches in a grid (pointers to the patches), and transform the camera position into - the grid coordinates. After all, this is STATIC stuff. I select patches based on the camera grid coordinates, and the patches have their world coordinates stored.

2) Set up each patch with a scale (x,z) and transform matrix to move it into proper position.

*** No scale - why scale? Scaling (axis length) is done when creating the vertex buffer data, which happens rarely.

3) Set up a Texture matrix to translate each patch to the proper place on a texture map. (this I''m not so sure of, cuz you might just need to provide a lookup with an offset in the Shader, remember newbie here and not sure if that can be done.)

*** Well, I dont put ONE texture over the patch - I basically want to splatter multiple textures.

3) Create a big 1024x1024 "terrain" texture using using either a 16bit or 32bit heighmap (allows for more variance).

*** NEVER. NEVER EVER. Thats exactly what I do not want to do.

4) Create an tri-strip index buffer and 33x33 vertex buffer for rendering a 33x33 patch. (note the overdraw to 33x33 to tie the terrain together)

*** Same here - 64x64 coordinate points, 65x65 vertices.

5) Create a shader that can use a lookup table or texture matrix(heighmap texture) to find y value to stuff into the world matrix per vertex and do all the other things needed, like the multi-texturing etc.)

*** BAD news: VERTEX shaders can access texture data, only texture coordinates. The stuff would have to go up in - well - the constant registers.

Now for rendering...

1) Set the stream sources for index and vertex buffers.

2) Walk the quadtree and find visible patches based on view frustum.

*** "Walk the quadtree" - that sounds very complicated to me :-) Too complicated for something as "limited" as this.

3) For each visible patch, swap in World and Texture (if needed) matrix for patch and render. (here you could also update values in the shader to draw detail textures based on distance or other things.)

*** Well, the problem I have is "no patch texture", and this turns out to be problematic.

Seems this could work quite well. When I get some time I think I might just have to code it up and see if it would work. Still, there is no LOD optimization happening so this is good for ground level viewed terrains, but not for something like a flight sim where you need longer distances.

*** Here my "higher renderers" come in. They have a different vertex buffer, and will not use a texture anymore, but have a color for every vertex - after all, this is also further away, and you dont see the details then. THIS renderer will later on actually be called FIRST, and THEN the details drawn in a second pass :-) The highe level renderer will be one or two algos - I think one working on slightly reduced vertex counts (33x33), another one batching multiple patches together, with 9x9 vertices per chung (though they are batched again).

Some other last minute thoughts are that you could probably get away with not even using a World matrix, but use use x and z scaling and x and z translation and update those values within the shader directly.

*** Yes, but there is no loss in setting the translation matrix. YOu have to load the data anyway. ACTUALLY unless you manage to make everything in the same transformation, you actually LOOSE - vertex shaders get slower tan the fixed pipeline fast.


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)
RegardsThomas TomiczekTHONA Consulting Ltd.(Microsoft MVP C#/.NET)
My comments with *** - quote tags are too much for me today :-)
Mine in +++

Currently I have a terrain engine that works with 1024x1204 terrains using 32x32 patches that are frustum culled and rendered as tri-strips brute force.

*** I thought that 32x32 was too small for today''s hardware :-)
+++ Reason: Frustum culling, 32x32 was slightly faster than 64x64 and 16x16.

It is quite fast as I can achieve 200-300fps on a gForce3/P4 1.7 when rendering at ground level (meaning that 15-40 32x32 patches are drawn, worst case frame rate drops to about 60-70fps with 500+ patches).

*** Thats something I will never test. I orgnanize the renderer a little more. I currenly render a maximum of 25 patches (5x5) and think of going to 7x7 - anything above this will be handled by ay separate renderer being a little simpler :-) Surely, out of the 25 patch, as you stand in the middle, you normally only see a fraction :-)

It is all multi-textured too, but it is using a DirectX 7 approach for almost everything.

*** Well, I use Dx 9 in C# - so thats a difference. I have a buffer class that I currently use to buffer the vertex information of the last (I render max. 25) 35 patches in a LRU fashion. I apply ONE texture right now - and start thinking how to make MORE textures.

+++ Reason: I said DX7 because I am not using shaders. SET_RENDERSTATEs instead

So...I''m a newbie with Shaders, but was thinking along the same lines, doing something like this.

1) Create a patch class that contains both a World and a Texture matrix and a bounding box for that terrain patch. (this may require some precomputation to get the bbox). Store these in a quadtree based on their bboxes).

*** Well, what for :-) Why do you actually need a bounding box at all. I go the other way: I have all patches in a grid (pointers to the patches), and transform the camera position into - the grid coordinates. After all, this is STATIC stuff. I select patches based on the camera grid coordinates, and the patches have their world coordinates stored.

+++ How do you render stuff that is far from the camera? Distant mountains would be a problem. I thought of using a grid pointer approach, with maybe a 2d frustum, but the frustum cull seems to work pretty well and allows me to have some decent distances. The reason I did 3d bbox was that I was thinking of doing some occlusion culling of patches at some point.


2) Set up each patch with a scale (x,z) and transform matrix to move it into proper position.

*** No scale - why scale? Scaling (axis length) is done when creating the vertex buffer data, which happens rarely.
+++ yes, could be done at vb creation. Either way same amount of calcs if you use a world matrix instead of manually translating later.

3) Set up a Texture matrix to translate each patch to the proper place on a texture map. (this I''m not so sure of, cuz you might just need to provide a lookup with an offset in the Shader, remember newbie here and not sure if that can be done.)

*** Well, I dont put ONE texture over the patch - I basically want to splatter multiple textures.

+++ Texture swaps can slow things down. I found it easy to just do a large texture in Bryce and add a detail texture as needed. That way you can draw features on the big texture map like roads and stuff. Depends on what you are going for of course, if you want all the landscape to look simlar then splatting looks nice. If you want to be able to have more variation and less texture swaping. (and you do not have to create a tool to help you figure out the texture splatting)

3) Create a big 1024x1024 "terrain" texture using using either a 16bit or 32bit heighmap (allows for more variance).

*** NEVER. NEVER EVER. Thats exactly what I do not want to do.
+++ Hmmm, then where do your heightmap values come from? You could store the hm in system mem and copy only the parts you want, but why worry a 1kx1kx16 is not all that big in video mem when dds compressed.

4) Create an tri-strip index buffer and 33x33 vertex buffer for rendering a 33x33 patch. (note the overdraw to 33x33 to tie the terrain together)

*** Same here - 64x64 coordinate points, 65x65 vertices.

5) Create a shader that can use a lookup table or texture matrix(heighmap texture) to find y value to stuff into the world matrix per vertex and do all the other things needed, like the multi-texturing etc.)

*** BAD news: VERTEX shaders can access texture data, only texture coordinates. The stuff would have to go up in - well - the constant registers.

+++ Have to look into that. If you can''t access the data directly on the texture map, you''re pretty much screwed. :< Somehow I think there has to be a way. Seems rather stupid that the video card companies didn''t consider that you might want to access other data sources for lookup in a Vertex shader.

Now for rendering...

1) Set the stream sources for index and vertex buffers.

2) Walk the quadtree and find visible patches based on view frustum.

*** "Walk the quadtree" - that sounds very complicated to me :-) Too complicated for something as "limited" as this.

+++ So simple to do you would be amazed. Also good for object culling.

3) For each visible patch, swap in World and Texture (if needed) matrix for patch and render. (here you could also update values in the shader to draw detail textures based on distance or other things.)

*** Well, the problem I have is "no patch texture", and this turns out to be problematic.

Seems this could work quite well. When I get some time I think I might just have to code it up and see if it would work. Still, there is no LOD optimization happening so this is good for ground level viewed terrains, but not for something like a flight sim where you need longer distances.

*** Here my "higher renderers" come in. They have a different vertex buffer, and will not use a texture anymore, but have a color for every vertex - after all, this is also further away, and you dont see the details then. THIS renderer will later on actually be called FIRST, and THEN the details drawn in a second pass :-) The highe level renderer will be one or two algos - I think one working on slightly reduced vertex counts (33x33), another one batching multiple patches together, with 9x9 vertices per chung (though they are batched again).

+++ Hard to do without terrain popping and t-section splits. Saw a tiled rendering system once that could do it by stitching, but seemed slower than my just blasting out a full 32x32 patch. I was considering a system using Progressive meshes and the patch based system, but that would be way different from what we are discussing here.

Some other last minute thoughts are that you could probably get away with not even using a World matrix, but use use x and z scaling and x and z translation and update those values within the shader directly.

*** Yes, but there is no loss in setting the translation matrix. YOu have to load the data anyway. ACTUALLY unless you manage to make everything in the same transformation, you actually LOOSE - vertex shaders get slower tan the fixed pipeline fast.

+++ Well I ordered a copy of ShaderX from nerdbooks.com, so I should know more about shaders at some point (even though it is a few months out of date.)

+++ Send me an email at some point, I''d be curious to find out why C# instead of C++? I use Java mostly at work and have done stuff with C#, but seems strange to use it for DX stuff as I would expect a speed penalty for using managed code and the context switching going on.
Again inline. I start currint out with ---cut--- and use §§§ now.

---cut---
*** I thought that 32x32 was too small for today''s hardware :-)
+++ Reason: Frustum culling, 32x32 was slightly faster than 64x64 and 16x16.

§§§ Hm - thats interesting. I think that the more advanced the hardware, the more vertices you want to push with one call - and let frustrum culling be frustrum culling.

---cut---
*** Well, I use Dx 9 in C# - so thats a difference. I have a buffer class that I currently use to buffer the vertex information of the last (I render max. 25) 35 patches in a LRU fashion. I apply ONE texture right now - and start thinking how to make MORE textures.

+++ Reason: I said DX7 because I am not using shaders. SET_RENDERSTATEs instead

§§§ I am still staying away from shaders. I will propably use them to decompress compressed vertex data, though :-)

---cut---
*** Well, what for :-) Why do you actually need a bounding box at all. I go the other way: I have all patches in a grid (pointers to the patches), and transform the camera position into - the grid coordinates. After all, this is STATIC stuff. I select patches based on the camera grid coordinates, and the patches have their world coordinates stored.

+++ How do you render stuff that is far from the camera?

$$$ NOT with this renderer. I will render the landscape in "layers" (three planned). This is the 0 renderer, which is responsible for the patch you are on, plus two fields of visibility - not more. Then there is the layer one renderer, and finally the layer 2 renderer, which has LARGER patches with less data - after all you are far away then :-) And no texturing, hopefully. This will all be integrated into the terrain system, and allows me to swap algorithms when needed.

Distant mountains would be a problem. I thought of using a grid pointer approach, with maybe a 2d frustum, but the frustum cull seems to work pretty well and allows me to have some decent distances. The reason I did 3d bbox was that I was thinking of doing some occlusion culling of patches at some point.

$$$ Even then, ise sphere tests - thats just a radius you need. FAST.

2) Set up each patch with a scale (x,z) and transform matrix to move it into proper position.

*** No scale - why scale? Scaling (axis length) is done when creating the vertex buffer data, which happens rarely.
+++ yes, could be done at vb creation. Either way same amount of calcs if you use a world matrix instead of manually translating later.

$$$ WRONG. NEVER EVER scale a matrix if you want lights to work corectly. Normals are scaled wrong. I read it in a lot of places - check the bloom papers (http://www.cbloom.com).

---cut---
+++ Texture swaps can slow things down. I found it easy to just do a large texture in Bryce and add a detail texture as needed.

$$$ I wont do a lot of texture swaps - I think a total of 32 for all 32 layers :-) THis includes the combined alpha texture. And texture swaps are getting cheaper with Dx8 and Dx 9.

+++ That way you can draw features on the big texture map like roads and stuff.

$$$ But the game logic can not see them :-)

Depends on what you are going for of course, if you want all the landscape to look simlar then splatting looks nice. If you want to be able to have more variation and less texture swaping. (and you do not have to create a tool to help you figure out the texture splatting)

$$$ Well, depends. OTOH the data is way lower, an I have close up details you can only dream of :-)

3) Create a big 1024x1024 "terrain" texture using using either a 16bit or 32bit heighmap (allows for more variance).

*** NEVER. NEVER EVER. Thats exactly what I do not want to do.
+++ Hmmm, then where do your heightmap values come from? You could store the hm in system mem and copy only the parts you want, but why worry a 1kx1kx16 is not all that big in video mem when dds compressed.

$$$ Heightmap data is stored in memory, actually in "super-patches" which can be loaded from disc. You are right with this. But why should I hurt the graphics card with the heightmap, when I still need the vertex info. I just create the vertices as needed (using a LRU cache, as I said - not creating them every frame), and keep the heightmap data in a format that I can easily swap out to disc. I want to go to 10.000x10.000 and larger POSSIBLE heightmaps, so thats the only option.

--- cut ---
5) Create a shader that can use a lookup table or texture matrix(heighmap texture) to find y value to stuff into the world matrix per vertex and do all the other things needed, like the multi-texturing etc.)

*** BAD news: VERTEX shaders can access texture data, only texture coordinates. The stuff would have to go up in - well - the constant registers.

+++ Have to look into that. If you can''t access the data directly on the texture map, you''re pretty much screwed. :<

$$$ Well, compressing vertex information to "survive" :-) I am looking into working with multiple streams, though - but it looks like I miss in stream offsets to be as flexible as I ned to be. So - this is basically back to "wasting the memory here". Once the GForce FX comes out (or I buy myself a Radeon for the interim), I might try replacing EVERYTHING with a displacement map and THEN use the heightmap as displacement texture :-)

Somehow I think there has to be a way. Seems rather stupid that the video card companies didn''t consider that you might want to access other data sources for lookup in a Vertex shader.

$$$ Well, there are limits in the way shaders work - they can be so fast because of these limits.

---cut---
+++ So simple to do you would be amazed. Also good for object culling.

3) For each visible patch, swap in World and Texture (if needed) matrix for patch and render. (here you could also update values in the shader to draw detail textures based on distance or other things.)

---cut---


One moment - you swap in the texture PER PATCH? GOD, thats a TON of things. And you talk of splatting being slow? Gosh.

---cut---
+++ Hard to do without terrain popping and t-section splits. Saw a tiled rendering system once that could do it by stitching, but seemed slower than my just blasting out a full 32x32 patch.

$$$ Thats a matter of seeing how far away you are. If you see FAR, you simply have too many small patches to be useful. Also, I propably will use pixel shaders to - make the 0renderer "fogging into invisibility", which should handle most problems here. let''s assume you have a patch of 65 vertices, 64 meters. IF you stand on a hill and look 1000 meters far away, there is NO WAY you can render this with all details on. NOt even with just one texture - not with full details. IMHO SLOWLY making the further away "detail" layer alpha blend over to the less detailed rendere will handle most things, and can be done - with a vertex shader, as it looks like.

---cut---
+++ Well I ordered a copy of ShaderX from nerdbooks.com, so I should know more about shaders at some point (even though it is a few months out of date.)

*** Be careful with this book. It is GREAT - i just get headachetrying to understand it. SHaders - especially the stuff they do - are COMPLEX. I would prefer to have an easier book to start with :-)

+++ Send me an email at some point, I''d be curious to find out why C# instead of C++? I use Java mostly at work and have done stuff with C#, but seems strange to use it for DX stuff as I would expect a speed penalty for using managed code and the context switching going on.

§§§ Well, lets see. Yes, there is a speed penalty - actually two: Managed/Unmanaged (which you CAN avoid - just reorganize your code to do this seldom) and the SLOW float stuff (badly optimised so far, but they are working on it). Garbage Collection is not an issue - I do not exactly create a lot of garbage. I reuse most objects :-)

Well, besides some pretty hefty Manged C++ stuff I do (video conferencing, DirectShow filters, CAPI integration etc.), I breath and live C#. I am assumed to be pretty good at this - I mean, I have MS calling in occasionally :-) This is a private research proiject for me (how far CAN I take C#). As you can see in my sig, I am a Microsoft MVP (for C# actually), and this is because i Have taken C# to the limits in the last two years (yep, I work with C# commercially for two years now). And I just want to see how far I can take C# BEFORE running into serious problems. SO far I took the "freedom" C# gave me to work with more efficient buffer algorithms (at least I think) and have been VERY satisfied, and support form the people I am in contact with (including Microsoft) has been very nice :-)


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)
RegardsThomas TomiczekTHONA Consulting Ltd.(Microsoft MVP C#/.NET)
As a followup message, I have started reading the ShaderX book. A lot is still over my head, but it does seem you might be able to use a stream of height values and write the x,z values parametrically within a shader. Similar to the displacement map and compression concepts the author speaks of. In terms of data transfer you can''t get any simplier, but as for the overhead of the shader instructions I just don''t know... just a thought.
Well, given the fact that I am actually writing this for STATE OF THE ART hardware (yet, just got a Readon 9700 Pro - could not resist. I actually just wanted to buy a new case for my copmputer, but they just had one left). Vertex Shader performance should not be too much of an issue. I will look into it :-)

Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)
RegardsThomas TomiczekTHONA Consulting Ltd.(Microsoft MVP C#/.NET)

This topic is closed to new replies.

Advertisement