[dx9] Terrain rendering performance issue

Started by
8 comments, last by Damocles 14 years, 8 months ago
I am working on making a kind of terrain system for an RTS style game. Given that the camera is always pointing down, I'm able to limit the terrain cells being drawn to around 10-35 depending on zoom level. The problem is, even drawing just these cells is running painfully slowly. The problem, at least according to fps rates and AMD code analyst is definitely CPU bound, but it's during a call to a graphics function. Look at this code:

fxManager->commonShaders[ECFX_Default3D]->BeginPass(0);
	int numV = 8*8;
	vector<P2>::iterator eit = gridIndexes.end();
	for (vector<P2>::iterator it = gridIndexes.begin(); it!=eit; ++it)
	{
		g->d3d->SetStreamSource(0, data->grids[it->x][it->y].vBuffer, 0, sizeof(_3DVERTEX)); // this line is taking up masses of cpu time
		g->d3d->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 6*numV, 0, 2*numV);
	}
	fxManager->commonShaders[ECFX_Default3D]->EndPass();

Nice and simple. All it's doing is rendering the vertex buffers stored in the grids[][].vBuffer var (which is a LPDIRECT3DVERTEXBUFFER9 object) for each cell. They all use the same index buffer which is set before this code. There are no errors being reported, even at max validation, it's just really slow. It also doesn't matter if I render 30 cells or 300. The gpu isn't even breaking a sweat. AMD code analyst says that the call to SetStreamSource is consuming huge amounts of cpu time. So I guess I must be rendering the terrain using the idiot's method. How should I be rendering the cells? Given that the cell vector changes every time the camera moves, would it be feasible to fill an entire vertex buffer with the vertex data of around 10-35 cells, potentially every frame? Or is there a more efficient way of rendering the cells that I haven't come across yet?
------------------------------------------[New Delta Games] | [Sliders]
Advertisement
This shouldn’t cause big performances issues.

Can you give us some more information? How many milliseconds do you need per frame? Do you only render this terrain tiles or something else, too?
Hmm,

Perhaps it's not actually the call to SetStreamSource that is causing the bottle neck.

Try changing it to this and re-run your test.
LPDIRECT3DVERTEXBUFFER9 pVertBuffer = data->grids[it->x][it->y].vBuffer;g->d3d->SetStreamSource(0, pVertBuffer, 0, sizeof(_3DVERTEX));


My Thought is that perhaps you're incuring cache issues with the data lookup - though calling SetStreamSource as little as possible is probably a good idea so swapping to use a dynamically filled single buffer might be advantageous.

By my quick calculations you'd need a vertex buffer capable of holding 3,840-13,440 vertices which is fairly small anyway.
MrF.--Code..reboot..code..reboot..sigh!
Okay, this is strange. AMD code analyst is no longer reporting the bottleneck as that dx call. In fact, according to it, there's no cpu bottleneck at all. I ran the profiler 5 or 6 times before and every time it said the call to SetStreamSource was the bottleneck, now it's saying there's no bottleneck, but there's no difference in framerate, and I haven't changed any of the code. How bizarre.

Performance is still not what I was expecting, but now I have no bottleneck to target. Unusually, it doesn't seem to matter how many cells are rendered - I can render 300 or 30 and the fps stays the same - which I guess that would suggest the call to SetStreamSource is definitely not the problem. I wonder if there's some sort of bad data problem or something that's causing either dx or the gpu to stall.

I'm wondering if it's something to do with the way I create the vertex buffer. I can render plenty of 'regular' meshes and the framerate stays high, but rendering just 20 of the terrain cells kills it.

I make the cells with a simple nested for loop, and create a vertex for each point, then create an index buffer to draw them. Like so:

void MeshGen::MakeFloorGrid( LPDIRECT3DVERTEXBUFFER9& outVBuf, const int& divisionsCount,							const float& gridSize, const vec2& positionOffset ){	int d2 = divisionsCount+1;	g->d3d->CreateVertexBuffer(d2*d2*sizeof(_3DVERTEX), D3DUSAGE_WRITEONLY, 0, D3DPOOL_MANAGED, &outVBuf, 0);	_3DVERTEX* v=0;	outVBuf->Lock(0,0,(void**)&v, 0);	float step = gridSize / divisionsCount;		int baseV=0;	for (int y=0; y!=d2; ++y)	{		for (int x=0; x!=d2; ++x)		{			v[(y*d2)+x].pos.x = (x * step) + positionOffset.x;			v[(y*d2)+x].pos.y = 0;			v[(y*d2)+x].pos.z = (y * step) + positionOffset.y;			v[(y*d2)+x].u = (1.0f/divisionsCount)*x;			v[(y*d2)+x].v = (1.0f/divisionsCount)*y;			v[(y*d2)+x].norm.x=0;			v[(y*d2)+x].norm.y=1;			v[(y*d2)+x].norm.z=0;			v[(y*d2)+x].col = DColor(1,1,1,1);		}	}	outVBuf->Unlock();}void MeshGen::CreateGridIndexBuffer(LPDIRECT3DINDEXBUFFER9& outIndexBuf, const int& divisionsCount){	g->d3d->CreateIndexBuffer(divisionsCount*divisionsCount*6*sizeof(WORD), D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &outIndexBuf, 0);	WORD* i=0;	outIndexBuf->Lock(0,0,(void**)&i, 0);	int d2 = divisionsCount+1;	int baseI=0;	for (int y=0; y!=divisionsCount; ++y)	{		for (int x=0; x!=divisionsCount; ++x)		{			// set indicies - counterclockwise facing			i[baseI] = (y*d2)+x+0; //tl			i[baseI+1]=((y+1)*d2)+x; //bl			i[baseI+2]=(y*d2)+x+1; //tr			i[baseI+3]=((y+1)*d2)+x; //bl			i[baseI+4]=((y+1)*d2)+x+1; //br			i[baseI+5]=(y*d2)+x+1; //tr			baseI+=6;		}	}	outIndexBuf->Unlock();}


Very straightforward. It seems unlikely this is the problem, but I can't think of anything else that might be the cause.


re: demirug
For testing purposes, I'm only rendering the terrain cells. I don't have an actual target time per frame, but when rendering 30+ meshes with over 1000 polys each, the framerate stays at over 400fps, and when rendering the terrain cells, the framerate drops to 140. That's just not right. It should be at least clear of 250 even if it's using poorly optimized buffers.

re: mrFlibble (king of the potato people!)

Nice idea, but nope. It had no impact on performance. I guess it wasn't a cache issue. I'll have to try rewriting the code to generate the entire floor-in-scene buffer on the fly, and see if that helps.
------------------------------------------[New Delta Games] | [Sliders]
Well, switching over to using one large vertex buffer gained very little. A tiny increase in performance - about 1-2%.

Just for the heck of it, I tried using non-indexed vertex buffers for each cell, and the performance for that system was abysmal. This leads me to think it is possibly something to do with the data stored in the buffer(s) that is being drawn really slowly, but I'm not sure why.
------------------------------------------[New Delta Games] | [Sliders]
Have you tried running it against the Debug runtime? That might reveal something. The CPU code looks ok to me.
I'm having an odd problem at the moment where I can't select the debug runtime in the dx control panel. Setting the value manually in the registry doesn't turn it on either. Very odd. Until I figure out why this is, I'm kinda limited to what I can do debug-wise. I tried PIX, but it hasn't revealed anything.
------------------------------------------[New Delta Games] | [Sliders]
Maybe you use some complex shaders for terrain rendering or switch textures just too often? That would be a good idea to run it with NvPerfHud enabled to see what is the real bottleneck (it will give you much more details than PIX and enabling it is just a few lines of code).
re: Master of Riddles

The terrain is rendering using the same shader I use to render all static 3d meshes, and they are rendering fine. There's only one texture in use, and it's set once, before all the terrain cells are drawn.
NvPerfHud might be only a few lines of code to activate, but it's also a new graphics card :(


I fixed the problem with not being able to use the debug runtimes (turns out something had updated the dx drivers, requiring me to get the latest SDK, which then required me to fart around for an hour trying to work around a new fxc compiler bug). Anyway, the debug runtimes aren't showing any warnings or errors for the terrain rendering.
------------------------------------------[New Delta Games] | [Sliders]
Thanks for your help guys, I have found the source of the problem. It was the pixel shader. I optimized out a dynamic loop in it, and now the terrain renders lightning fast. I never realized till now just how slow dynamic looping in shaders really is!

I didn't realize this was the problem before because it wasn't until I used this shader on the terrain that the entire screen was being filled by it. Drawing lots of meshes was still only filling 1/2 the screen, so I never cotton onto the fact that the shader was slow as molasses.

edit:
Just for info purposes for anyone reading this:
With a fast-executing shader, it's actually faster to render the individual cell buffers (calling SetStreamSource many times) than it is to create one large buffer for rendering (calling SetStreamSource once). It's only about 3% difference, but worth noting.
------------------------------------------[New Delta Games] | [Sliders]

This topic is closed to new replies.

Advertisement