Jump to content
  • Advertisement
Sign in to follow this  
  • entries
    455
  • comments
    639
  • views
    424574

TLM : Some performance issues

Sign in to follow this  
_the_phantom_

91 views

During the developement one of the things I was someone lacking was an idea of performance, sure when you got to a 1024*1024 map the graphics card broke down and cried at me but below that while it was smooth I didn't know how fast things were going.

One thing which was bothering me was this bit of code

float SinkWest = 0.5 * (east.x + east.y - east.z + east.w);
float SinkEast = 0.5 * (west.x + west.y + west.z - west.w);
float SinkSouth = 0.5 * (-north.x + north.y + north.z + north.w);
float SinkNorth = 0.5 * (south.x - south.y + south.z + south.w);



It had been making my brain twitch for a while now and after posting it into IRC I got the answer as to why; I should have been using the dot() function instead!

A few mins in an editor later and the code above had become

float SinkWest = dot(east,vec4( 0.5, 0.5,-0.5, 0.5));
float SinkEast = dot(west,vec4( 0.5, 0.5, 0.5,-0.5));
float SinkSouth = dot(north,vec4(-0.5, 0.5, 0.5, 0.5));
float SinkNorth = dot(south,vec4( 0.5,-0.5, 0.5, 0.5));



Thanks to Zeux for pointing that out to me [smile]

Now, in preparation for my presentation on tuesday I decided that I might want to get some performance figures, so off to XP I went armed with gDEBugger to do some looking about.

On first run things were... disappointing to say the least, even on a 40*40 matrix we weren't clearing 40fps [sad] this wasn't good at all as a CPU version was doing ~24fps at 50*50; clearly we had a problem.

So, some code was commented out and behold, the problem was narrowed down to the energyTransfer pass, which is the main one which does all the work, namely;
- 5 texture samples
- 5 dot products
- 1 subtraction
- 2 colour writes

After an inspired bit of fiddling it seems I found the problem; MRT via FBO with 32bit floating point textures REALLY hurts it seems. Getting rid of the extra write jumpped my fps from 40 to ~800 or so.

Clearly my design wasn't optimal; so I sat down redesigned it for single outputs, which introducted an extra pass;

Pass 1; energy transfer
- input : energy map
- output : new energy map
Pass 2; Height generation
- input : energy map from pass 1
- output : height map
Pass 3; Drive simulation
- input : energy map from pass 1
: driving map
- output : new energy map for input into pass 1
pass 4; Normal generation
- input : height map
- output : normal map


Having written the extra shader and made the required changes I fired up the program again with a 40*40 matrix; ~710fps.
Much better [grin]

So, currently things look as follows;

Size Approx fps
40*40 710
50*50 610
100*100 237
256*256 48
512*512 13
1024*1024 0-3


I suspect part of the problem is I'm moving EVERYTHING around as 32bit floats, however I suspect I can get away with only the height map and normal map being 32bit floats, the driving map being a single channel 32bit value and the rest being 16bit; I'll be testing that when I get into real benchmarking mode.

I've also got one more texture floating about than I need, killing that might help matters as well [wink]

I'm also wondering if using 32bit index buffers is hurting, it might be worth drawing the final image in chunks, see if that matters too much with the fps (although it wont effect the TLM speed so it might not be worth the hastle).

Tomorrow is day or Powerpoint slide making and generally working out just how I'm going to waffle about this; apprently I've got 15mins, the problem is I could probably talk for an hour and still not cover everything.. ah well...
Sign in to follow this  


2 Comments


Recommended Comments

Quote:
Original post by phantom
CPU version was doing ~24fps at 50*50

~

shader and made the required changes I fired up the program again with a 40*40 matrix; ~710fps.


excellent! congrats [smile]

Share this comment


Link to comment
Ok, I'm a little confused here about the dot function thing. I know that the two formulations are equivalent, but why create an object and use a function call to evaluate expressions that you already had in a clear, efficient form?

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!