Resources/approach for 3D tile/height map rendering

Started by
7 comments, last by SyncViews 7 years, 10 months ago

I am looking from moving from a pure 2D map to making it 3D with a height map, currently using D3D11. My initial goal being something like Stronghold or Railroad Tycoon 2, keeping my existing 2D assets, but I want the scope to replace those with 3D ones and full camera movement later (shallow camera angles being the main issue, due to the resulting long view distance).

The problem I am having is determining the best/correct approach to render this terrain. A quick test showed me that just having a 1000x1000 tile 2xtriangle quad list doesn't perform (4 million vertices, 6 million indices, and about 50fps on my AMD 7870 with just that one DrawIndexed D3D11 call).

I have various ideas for enhancing this, of various complexity, but don't really have any idea which is a suitable one, or if I am missing something simple on this matter.

  1. Convert the list to a strip. I expect this to be a fairly good gain, with around half as many vertices for the vertex shader, and will likely do it. Need to work around some complexities (e.g. Stronghold style cliffs, and vertex texture fields), and it doesn't really seem enough by itself, would only expect at most that 2x performance boost.
  2. At longer distances from the camera, only render a single quad for multiple tiles (average height, most common tile texture, etc.). This however seems to force me to re-create the mesh each frame, and need a way to deal with the edges between detail levels (to support a shallow camera angle, since some tiles are very close, and others distant)? Done/seen this in plenty of games/engines with independent meshes, but not encountered this single mesh issue before (in all of them I did technical stuff with, I found they used pre-made low detail versions of the models to just swap between).
  3. For larger maps, making sure to only ask D3D to draw ranges I know are in the view area, either with a dynamic mesh, or just making multiple Draw calls for the needed subsets. Essentially this is what I have always done in 2D with a dynamic vertex buffer, but there I only had at most maybe 100x100 on-screen tiles so only a small per-frame buffer, and its trivial to determine the viewable area (x/y tile rect for a simple for loop).
Advertisement

Not sure what is going on in your vertex shader - but if you are doing transform math (and other math) to move each tile to its place and your tiles are largely going to be static you could use transform feedback (I think directx calls it "stream buffers") to run all of your vert attribs through the vertex shader and output them in to buffer objects - then use those for all future draw calls with a vertex shader that simply passes the attributes on.

Essentially this would pre-compute all of your vertex attributes - any time you need to move a tile or change something you would run it through your orignal vertex shader again in order to update all of the attributes in the output buffers.

This may give you a performance boost if the computations in the vertex shader are the bottleneck. If however, the fragment shader is bogging things down then you will likely get little to no performance gain.

Another option is to decrease view distance by using fog - you would have to use it only in the horizontal plane however as it would look strange to have fog come in as you zoomed the camera out vertically - you are likely going to want to restrict the camera movement for this type of game however.

If you are using instancing at all (if you have lots of tiles that have the same geometry and texture) then you could arrange your tiles so that you can calculate the tile instance index based on the current camera position (ie using a grid) and you could just use a buffer with instance ids and another with instance transforms and add/remove the id and transform from those buffers in order to cull away the tile - you could map the buffers rather than re-uploading everything in the two buffers on every camera change.

Just some ideas off the top of my head.

You could also use tesselation to create the resulting height Mesh. With this your original Mesh can be less detailed. In the tesselation Stage you can determine the amount of tesselation you need based on the distance of the camera to the Terrain Patch. For each Patch you sample your heightmap at the new tesselated points.

Another possibility would be to divide your whole Terrain in smaller Parts. For each part you could save one corner Position and the maximum extent. With this bounding Box you can perform frustum culling on your terrain. To make the culling faster you could create a hierarchy of bounding volumes like a Quad tree and just Traverse this.

Seems to be the vertex shader limiting it. If i intentionally transform it out of view, so nothing for the pixel shader, the frame rate is the same.

The 1000x1000 test mesh I created is already in world space. The vertex shader just multiplies the vertex position with the camera matrix and copies the rest of the fields over (normal vector + colour for now). Also occurred to me that making a triangle strip is a lot harder than on first consideration since differnt tile types have different textures/uvs, and the edge tiles are multiple textures.

Smaller chunks with culling is something I intend to do, along with fog, for larger map areas , but was hoping with modern gpu's could have somthing near this in view at once.

Could maybe do a level of detail thing by having say 100x100 chunks but keeping the borders full detail? Not sure how hard that is to create along the edges. Seems fairly complex along with optimising flatish, single texture regions.

Will have to see what tessellation can do. Not something I have used.

Not sure how instancing is meant to work? Seems unlikely and hard to determine tiles with the same textures and geometry even if i restrict the height map to 8bits?

Still feeling this is getting overly complex though and missing a simple solution, or at least a well known one to implement that has already gone through the process of figuring out the ideal approach, and what optimidations are worthwhile. I can think of many older games that did this predating modern GPU's and features.

edit:

shader is just this right now.


//vertex shader
Output main(Input input)
{
    Output output;
    output.pos = mul(input.pos, viewProj);
    output.normal = input.normal;
    output.colour = input.colour;
    return output;
}
//pixel shader
float4 main(Input input) : SV_TARGET
{
    float lightIntensity = saturate(dot(input.normal, lightDirection));
    float4 colour = saturate(input.colour * diffuseColour * lightIntensity);

    return colour;// * texture2d.Sample(samplerState, input.uv);
}

Just to make sure - you are not re-filling/re-allocating your vertex buffers every frame are you?

The fact that you are still getting the slow down when everything is off screen (no fragment shader action) at 4 million verts with that graphics card and only the vertex operations you mentioned makes me suspicious

I made a hex tile renderer and it can render 64 by 64 by 20 high hex tile chunks, where each tile is about 150 verts on my laptop geforce 680m card with no problems - and thats without any culling where most of the work is done in the fragment shader - i am using a deferred rendering system for lighting and that is included..

It makes me think that you must have something else going on - like trying to upload all your verts to the buffer every frame, or some cpu side processing slow down that is occuring - i would think that your gpu could handle 4 million verts

Not right now, although its not an immutable buffer, maybe I need to play with the creation flags. At some point id want to update sections where the world gets changed, but all I did in my test was Map/Unmap the constants buffers to take an update camera matrix, cleared the render target, a few Set* calls and the single DrawIndexed. Not even got textures in play there yet.

CPU usage was around 0.5% in task manager (i7-4790K), although didn't run the profiler on it.

EDIT: Actually changing the usage flag was a lot faster, guess the driver put it somewhere the GPU cant access efficiently before, even though i never made it CPU readable. I also only just realized that such a vertex buffer is still 100's of megabytes, depending on how much data I need in a Vertex, so I guess that is not a particularly good thing either.

100s of MB is fine in a vertex buffer as long as the target gpu has the available memory - the gpu is great at taking in huge streamlined buffers of data - your really wouldnt want it any other way.

After all - if you have 4 million verts each with 3 floats for position the buffer with positions is already about 192 megs.

I noticed that the parameters for buffer creation and manipulation are very important and can cause significant performance differences - for example - at first I was mapping my buffers as read/write - when I changed this to write only my performance was almost 10 times better.

Since your vertex buffers really never need to change (i cant think of why individual vert positions or normal vecs would need to change on a textured height map) then I would definitely use an immutable buffer.

Then to move the entire height map around you can do the single world matrix thing and your camera can have its view matrix - this should be plenty fast for your situation.

Even if you set shader uniforms using individual calls rather than mapping uniform buffers it still should be plenty fast.

Anyways hope you got it figured out.

Well yes, hadn't really though about the size maths since each vertex is small lol. Well guess the low-end and laptop GPU's wouldn't be able to render so much in detail anyway.

Since your vertex buffers really never need to change (i cant think of why individual vert positions or normal vecs would need to change on a textured height map) then I would definitely use an immutable buffer.

The heights can be changed slightly at runtime though. e.g. to flatten a bunch of tiles for a building or road. But I guess maybe for such things its OK to just create a brand new buffer and throw out the old one, more or less what I expected a write-only map/unmap to do behind the scenes.

But still, is such a triangle list really the normal approach all these games took/take? Found lots on the things various FPS/third-person games take for indoor areas or comparatively small outdoor areas (to only render the rooms the player can see, and simply lower detail meshes with distance for independent objects/entities), but not so much for large outdoor areas, even though games have been doing it for years while keeping pretty good detail for close terrain.

Is the "100x100 chunks but keeping the borders full detail" a suitable idea, or should I have a better look at algorithms to deal with borders, since I don't really need full detail "lines" joining distant regions? Obviously putting a low-res region directly next to a high-res one while doing nothing else is liable to leave gaps along the edges where the gradient changes.

Well tried a few things to have chunks with a level of detail based on distance (merging 2x2, 4x4, etc. tiles into a single tile), but avoiding visible seams is proving an issue.

Solved it to an extent by adding vertical edge pieces going down to the lowest height, so at least no gaps to see through. Seems like a bit of a hack though.

This topic is closed to new replies.

Advertisement