Hi,
I'm trying to improve my terrain shaders, because its taking ~8ms per frame to render the terrain and my not even rendering it to the shadow map cascades (which will probably take some more ms per frame).
Shader to render the terrain to GBuffer
VS_OUT GBufferVS(VS_IN vIn)
{
VS_OUT vOut;
Just create the sampler with wrap or clamp, you do not need to use an if statement to enforce that.
Whats the purpose of geometry buffer pass?
I have gone down the same road you are, and I can see what you are trying to do. You are trying to offload work to the GPU, which can be a good thing, but most of the work is redundant and can be done once on the cpu and then rendered as static geometry which is much faster than what you are currently doing.
Create your terrain as grids to be built on the CPU, then sent to the gpu to draw as static geometry, this will reduce your draw times down by alot. Also, the geometry shader sucks for doing alot of work, its purpose is for small stuff.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
I forgot to mention that this is a Geometry Clipmaps implementation. So I cant use static geometry.
Im not using the geometry shader... I'm rendering the geometry to the Gbuffer and then a second time to the Light Pre-Pass Second geometry pass buffer.
Also, Im instancing most of the geometry so I just make 5 draw calls (10 draw calls because I draw the terrain twice)
I'm using NVIDIA PerfHUD to get usage numbers...
@phantom whats do you mean by shader bottlenecks? I just know that my game is GPU bound, because the driver is sleeping about 6ms per frame, is that what your asking?
Numbers as to where your shaders are spending their time; ALU or sample bound for example? or maybe even raster operation bound (writing the pixels) which is common in a deferred renderer setup.
Numbers as to where your shaders are spending their time; ALU or sample bound for example? or maybe even raster operation bound (writing the pixels) which is common in a deferred renderer setup.
Also, what hardware are you rendering this on?
I dont know how to check if a shader is ALU or sample bound... Take a look at this image anyway:
Im running on Intel Pentium Dual-Core E5300 2.60GHz (bad CPU I know) and NVIDIA GeForce 9800 GT
geometry clip maps can be done on the cpu. You can implement this as static geometry. Why couldnt you? Pre compute each grid as a separate mesh on the cpu, then simply draw them -- problem solved. There isnt a good reason to preform redundant calculations on the gpu when you can do them once on the cpu -- this is the reason you are getting such poor performance.
Geometry clipmaps is a technique for how to piece together geometry that allows for long distance viewing -- and large terrains. There is no rule dictating that it must be done solely on the gpu. I realize many implement it this way, but it is not the best way.
If you implement it on the cpu, you will quickly see how much faster the technique is.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
I'll admit up front I don't have a lot of experiance with PerfHUD, however even so I'll do a quick walk through on bottlenecks here
So, firstly for this draw call, yes you are certainly spending most of your time in VS instructions as the 'instruction count ratios' shows. This is mildly worrying as while your vertex shader is pretty heavy there are generally more pixels than vertices being processed in any given scene.
When it comes to finding bottlenecks the graph next to the ICR graph is your friend; the graph is showing the amount of time each functional unit on the GPU is taking for the frame, state bucket and (importantly) draw call.
Based on the size of the peach coloured bar it would seem most of your time is being spent in input assembly and geometry setup; shader, texture and raster ops aren't remotely a problem, even frame buffer traffic is a bit low.
The net result of this is that the problem ISNT with your shaders; they are executing quickly on the draw call taking hardly any time, the problem seems to be input assembly and geometry setup as they are taking much much longer; in fact shader, texture, ROP and frame buffer combined are taking less time than input assembly.
This would lead me to think that you are submitting too many vertices, which would support the high VS:PS instruction counts for the draw call as well, so you might want to look at your LOD scheme and check how many verts you are submtting.
I think, and I need to refresh my memory but I'm 99% certain, that geometry setup includes dividing up work into pixel quads for the hardware to process at the pixel level.
Shader is the amount of time executing ALU ops, texture is going to be the amount of time performing texture operations (aka sampling); based on that most of your time is spent doing ALU ops rather than TEX ops, which is a good thing as most GPUs these days are biased in favour of ALU ops.
As for the vertex count, 1 million might be a little high, but it comes down to density as well as lots of small triangles are not very hardware friendly due to how the hardware dispatches under the hood.