DrawIndexedPrimitives takes 20 to 40 FPS to draw

Started by
27 comments, last by LemonBiscuit 10 years, 1 month ago


I can't get them correct, but anyway, now even the normals in the side of the chunks are incorrect (I should get vertices from other chunks to get the normal, etc.) It's getting a little too complex for the little optimization I want to do.

Yeah, you need to use vertices outside the chunk to calculate normals. But, you can do this offline, and make it part of your heighmap data. So it doesn't really complicate your chunk algorithm too much.


If the boundingbox containing the last vertices of my terrain is not in view, then the amount of primitives to draw is the amount of vertices - the amount of vertices contained in the box.

I don't quite understand how this would work. Like, how you would order your vertices in order to make this work.

Another alternative, which I suggested before, is to have a terrain grid just large enough to cover the area viewable by the camera, and move it with the camera. This is what I used in my game engine, and in terms of managing stuff on the CPU, it's extremely straightforward. I just have a single vertex and index buffer, and I set a world matrix to offset the terrain grid by the right amount when drawing. Height calculation is done by sampling the heightmap in the vertex shader. But I think it works best if you have a camera that doesn't allow a great variety in viewing angles (otherwise you'll have to account for different numbers of vertices being seen).

Advertisement

Well here's how I do it:

I generate a certain amount of bounding boxes, each containing a certain amount of vertices.

They are generated in the same order of the vertices:

First bounding box contains the first 1000 vertices, etc.

Then I check for each draw which bounding box is the first in view and the last.

Example: If the first one is not in view, but the second one is, then I start rendering the vertices starting fromm the 2000th vertice.

If the last bounding box is not in view, the the one before is, then I skip rendering the last 1000th vertice.

This is not a huge optimization at all, is it won't work in many angles of the camera, but it was the easiest way I found and the fastest one, the deadline for my project is pretty soon :( I may get back on this on another project anyway

But I didn't really get your other alternative..


This is not a huge optimization at all, is it won't work in many angles of the camera, but it was the easiest way I found and the fastest one, the deadline for my project is pretty soon I may get back on this on another project anyway

So each bounding box contains a grid of vertices? And they are all the same size? And they are arranged top to bottom, left to right or something?


But I didn't really get your other alternative..

Suppose the largest patch of terrain you ever see at one time in your world is 200 X 200. Make a 200 X 200 terrain vertex grid, positioned at (-100, -100) to (100, 100), say (assuming each grid square corresponds to one world unit). So now, if your camera is centered looking at (173, 192), draw the grid centered at that location (So (73, 92) to (273, 292)).

This means you just need to add an offset to your vertex positions in your vertex shader (to move the grid so it's centered when the camera is looking). And of course you can't associate a height with your grid vertices, since the same grid is used to render any patch of terrain. So instead you sample the height from your height map in your vertex shader.

I render all my terrain with a single 97x97 grid (so 9409 vertices):

compare.jpg?w=906

If I zoom out, you can see the patch:

zoomout.jpg?w=762

That looks pretty nice, but my game is a FPS, so the angles will be really different, and I'll need to modify the grid everytime (as you stated before).

But your idea seams great for your type of game.

Btw, nice render!

Hey, I'm back again.

The problem comes from my Tex2d in my shader.


	float div = clamp(0.2*clamp(0.7f*(input.Depth), 1, 400), 1, 400);
	if(div < 10)
	{
		rTex = tex2D(RTextureSampler, input.UV * TextureTiling) / div;
		gTex = tex2D(GTextureSampler, input.UV * TextureTiling) / div;
		bTex = tex2D(BTextureSampler, input.UV * TextureTiling) / div;
		base = tex2D(BaseTextureSampler, input.UV * TextureTiling) / div;
	}

	float clamp2 = clamp(0.01f*input.Depth, 0, 1);
	float3 rTex2 = tex2D(RTextureSampler, input.UV * 0.1) * clamp2;
	float3 gTex2 = tex2D(GTextureSampler, input.UV * 0.1) * clamp2;
	float3 bTex2 = tex2D(BTextureSampler, input.UV * 0.1) * clamp2;
	float3 base2 = tex2D(BaseTextureSampler, input.UV * 0.1) * clamp2;

	float3 weightMap = tex2D(WeightMapSampler, input.UV);

	float3 output = clamp(1.0f - weightMap.r - weightMap.g - weightMap.b, 0, 1)
					* base
					+ base2 + weightMap.r * rTex2 + weightMap.g * gTex2 + weightMap.b * bTex2
					+ weightMap.r * rTex + weightMap.g * gTex + weightMap.b * bTex;

As my terrain is pretty big, I only draw highly tiled textures around the player (div < 10). For far textures, I draw only 1 texture all over the map.

If I remove the if block above, I gain almost 30 FPS (~20 FPS to ~50FPS)

Here are my samplers:

http://pastebin.com/nxJNeyXG

Why does TEX2D takes so much from the GPU???

A few notes:

1) texture sampling is not cheap! Obviously if you take more texture samples, your shader will be slower.

2) the inside of your if statement is most likely being executed for every pixel. That means you are making 9 texture samples for every pixel, instead of 5. tex2D requires information from adjacent pixel shader executions in order to do its job (calculate the right mipmap level), so all pixels must take the same code path, which means the if block must always be executed (even if the results don't contribute to the final value). The fix here is to only put your UV calculations in the if block, then use those to sample from your 4 textures.

3) make sure your textures are mipmapped. Otherwise you'll be thrashing the texture cache when rendering textures "far away". This can have a huge performance impact.

4) on some architectures (e.g. iPad), modifying the texture coordinates you sample from in the pixel shader is considered a "dependent texture read" and can have a significant performance impact. I think this is probably unlikely in your case though, assuming you're running this on a PC. In any case, an easy fix is to move the calculations to the vertex shader.

(2) and (3) are probably the most immediate things you can do that will give you a performance boost.

bit off topic. But phil_t you have very well calibrated to-world mapping on your rather stretheched vertical trinagles! cudos, what production technique did you use for that? I still strugle with so many options. Is it a simple walk-to closest-world-position? You obviously choose sampler to pick by vertex color weight, but this mapping is awesome!


But phil_t you have very well calibrated to-world mapping on your rather stretheched vertical trinagles! cudos, what production technique did you use for that? I still strugle with so many options. Is it a simple walk-to closest-world-position? You obviously choose sampler to pick by vertex color weight, but this mapping is awesome!

I'm using projective texturing on three axes to avoid stretching on steep areas (the "overhead" texture atlas contains the flat/grassy textures, and the other two samples come from the "cliff" texture atlas). I blend between the three texture samples based on the terrain normal. So that's 3 samples plus another 3 for the corresponding normal maps.

For transitioning/blending between terrain/cliff types, I use a single index texture that contains the texture index for a specific world position (this technique is described in "Large-Scale Terrain Rendering for Outdoor Games" in GPU Pro 2. The interpolated value for this tells me the two texture indices (within my atlases) I need to sample from. So that doubles my number of samples to 12, plus 1 for the index texture. So a total of 13 texture samples! However, I use dynamic branching to skip texture samples when the blend weight for a particular axis is zero (this means I calculate mip level separately outside the branch and use tex2Dlod instead of tex2D in the branch) - so for large contiguous areas of cliff or grass, I'm generally just doing 5 texture samples (1 index texture, (1+1) diffuse/normal for texture index A, (1 + 1) diffuse/normal for texture index B)).

A few notes:

1) texture sampling is not cheap! Obviously if you take more texture samples, your shader will be slower.

2) the inside of your if statement is most likely being executed for every pixel. That means you are making 9 texture samples for every pixel, instead of 5. tex2D requires information from adjacent pixel shader executions in order to do its job (calculate the right mipmap level), so all pixels must take the same code path, which means the if block must always be executed (even if the results don't contribute to the final value). The fix here is to only put your UV calculations in the if block, then use those to sample from your 4 textures.

3) make sure your textures are mipmapped. Otherwise you'll be thrashing the texture cache when rendering textures "far away". This can have a huge performance impact.

4) on some architectures (e.g. iPad), modifying the texture coordinates you sample from in the pixel shader is considered a "dependent texture read" and can have a significant performance impact. I think this is probably unlikely in your case though, assuming you're running this on a PC. In any case, an easy fix is to move the calculations to the vertex shader.

(2) and (3) are probably the most immediate things you can do that will give you a performance boost.

I didn`t really know about MipMapping, and now that you wrote about it, I found this link: http://blogs.msdn.com/b/shawnhar/archive/2009/09/14/texture-filtering-mipmaps.aspx, and I just won 30 FPS by adding 3 lines to my code. The above code I posted was trying to simulate mipmaping (in an ungly way) and it wasn`t even working fine -_-.

Thank you very much phil!

This topic is closed to new replies.

Advertisement