TLM : Some performance issues

posted in Not dead...
Published March 17, 2007
Advertisement
During the developement one of the things I was someone lacking was an idea of performance, sure when you got to a 1024*1024 map the graphics card broke down and cried at me but below that while it was smooth I didn't know how fast things were going.

One thing which was bothering me was this bit of code
	float SinkWest = 0.5 * (east.x + east.y - east.z + east.w);	float SinkEast = 0.5 * (west.x + west.y + west.z - west.w);	float SinkSouth  = 0.5 * (-north.x + north.y + north.z + north.w);	float SinkNorth = 0.5 * (south.x - south.y + south.z + south.w);

It had been making my brain twitch for a while now and after posting it into IRC I got the answer as to why; I should have been using the dot() function instead!

A few mins in an editor later and the code above had become
	float SinkWest  = dot(east,vec4( 0.5, 0.5,-0.5, 0.5));	float SinkEast  = dot(west,vec4( 0.5, 0.5, 0.5,-0.5));	float SinkSouth = dot(north,vec4(-0.5, 0.5, 0.5, 0.5));	float SinkNorth = dot(south,vec4( 0.5,-0.5, 0.5, 0.5)); 

Thanks to Zeux for pointing that out to me [smile]

Now, in preparation for my presentation on tuesday I decided that I might want to get some performance figures, so off to XP I went armed with gDEBugger to do some looking about.

On first run things were... disappointing to say the least, even on a 40*40 matrix we weren't clearing 40fps [sad] this wasn't good at all as a CPU version was doing ~24fps at 50*50; clearly we had a problem.

So, some code was commented out and behold, the problem was narrowed down to the energyTransfer pass, which is the main one which does all the work, namely;
- 5 texture samples
- 5 dot products
- 1 subtraction
- 2 colour writes

After an inspired bit of fiddling it seems I found the problem; MRT via FBO with 32bit floating point textures REALLY hurts it seems. Getting rid of the extra write jumpped my fps from 40 to ~800 or so.

Clearly my design wasn't optimal; so I sat down redesigned it for single outputs, which introducted an extra pass;
Pass 1; energy transfer- input  : energy map- output : new energy mapPass 2; Height generation- input  : energy map from pass 1- output : height mapPass 3; Drive simulation- input  : energy map from pass 1         : driving map- output : new energy map for input into pass 1pass 4; Normal generation- input  : height map- output : normal map


Having written the extra shader and made the required changes I fired up the program again with a 40*40 matrix; ~710fps.
Much better [grin]

So, currently things look as follows;
Size       Approx fps40*40         71050*50         610100*100       237256*256        48512*512        131024*1024     0-3


I suspect part of the problem is I'm moving EVERYTHING around as 32bit floats, however I suspect I can get away with only the height map and normal map being 32bit floats, the driving map being a single channel 32bit value and the rest being 16bit; I'll be testing that when I get into real benchmarking mode.

I've also got one more texture floating about than I need, killing that might help matters as well [wink]

I'm also wondering if using 32bit index buffers is hurting, it might be worth drawing the final image in chunks, see if that matters too much with the fps (although it wont effect the TLM speed so it might not be worth the hastle).

Tomorrow is day or Powerpoint slide making and generally working out just how I'm going to waffle about this; apprently I've got 15mins, the problem is I could probably talk for an hour and still not cover everything.. ah well...
0 likes 2 comments

Comments

mrbastard
Quote:Original post by phantom
CPU version was doing ~24fps at 50*50

~

shader and made the required changes I fired up the program again with a 40*40 matrix; ~710fps.


excellent! congrats [smile]
March 18, 2007 12:22 PM
jjd
Ok, I'm a little confused here about the dot function thing. I know that the two formulations are equivalent, but why create an object and use a function call to evaluate expressions that you already had in a clear, efficient form?
March 18, 2007 03:38 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement

Latest Entries

Advertisement