One thing which was bothering me was this bit of code
float SinkWest = 0.5 * (east.x + east.y - east.z + east.w); float SinkEast = 0.5 * (west.x + west.y + west.z - west.w); float SinkSouth = 0.5 * (-north.x + north.y + north.z + north.w); float SinkNorth = 0.5 * (south.x - south.y + south.z + south.w);
It had been making my brain twitch for a while now and after posting it into IRC I got the answer as to why; I should have been using the dot() function instead!
A few mins in an editor later and the code above had become
float SinkWest = dot(east,vec4( 0.5, 0.5,-0.5, 0.5)); float SinkEast = dot(west,vec4( 0.5, 0.5, 0.5,-0.5)); float SinkSouth = dot(north,vec4(-0.5, 0.5, 0.5, 0.5)); float SinkNorth = dot(south,vec4( 0.5,-0.5, 0.5, 0.5));
Thanks to Zeux for pointing that out to me [smile]
Now, in preparation for my presentation on tuesday I decided that I might want to get some performance figures, so off to XP I went armed with gDEBugger to do some looking about.
On first run things were... disappointing to say the least, even on a 40*40 matrix we weren't clearing 40fps [sad] this wasn't good at all as a CPU version was doing ~24fps at 50*50; clearly we had a problem.
So, some code was commented out and behold, the problem was narrowed down to the energyTransfer pass, which is the main one which does all the work, namely;
- 5 texture samples
- 5 dot products
- 1 subtraction
- 2 colour writes
After an inspired bit of fiddling it seems I found the problem; MRT via FBO with 32bit floating point textures REALLY hurts it seems. Getting rid of the extra write jumpped my fps from 40 to ~800 or so.
Clearly my design wasn't optimal; so I sat down redesigned it for single outputs, which introducted an extra pass;
Pass 1; energy transfer- input : energy map- output : new energy mapPass 2; Height generation- input : energy map from pass 1- output : height mapPass 3; Drive simulation- input : energy map from pass 1 : driving map- output : new energy map for input into pass 1pass 4; Normal generation- input : height map- output : normal map
Having written the extra shader and made the required changes I fired up the program again with a 40*40 matrix; ~710fps.
Much better [grin]
So, currently things look as follows;
Size Approx fps40*40 71050*50 610100*100 237256*256 48512*512 131024*1024 0-3
I suspect part of the problem is I'm moving EVERYTHING around as 32bit floats, however I suspect I can get away with only the height map and normal map being 32bit floats, the driving map being a single channel 32bit value and the rest being 16bit; I'll be testing that when I get into real benchmarking mode.
I've also got one more texture floating about than I need, killing that might help matters as well [wink]
I'm also wondering if using 32bit index buffers is hurting, it might be worth drawing the final image in chunks, see if that matters too much with the fps (although it wont effect the TLM speed so it might not be worth the hastle).
Tomorrow is day or Powerpoint slide making and generally working out just how I'm going to waffle about this; apprently I've got 15mins, the problem is I could probably talk for an hour and still not cover everything.. ah well...
excellent! congrats [smile]