But with the current way I am drawing the projected shadows, this becomes very slow with even just a few dozen shadow casters on the screen. Imagine a small orthographic projection on each object. Each shadow caster has its own view matrix.
Shadows are accumulated by passing each object's shadow map and projection with the view matrix. I clear a render target to Transparent and every pixel that is shadowed is set to white, because I want to take advantage of the Additive blendstate to add up all the shadowed areas (AlphaBlend works just the same but Additive makes more sense). When all shadows are added up this render target texture is inverted and multiplied with the lighting output.
In my first approach looping over the shadow casters draws a full screen quad for each shadow to be accumulated in the render target, and this is the primary cause of slowdown. There is a major flaw to this approach. It is definitely fill-rate bound- too many additive passes with full-screen quads makes the program chug at a slow framerate. Drawing one quad is no problem- drawing at least 20 brought down my framerate to under 15, whereas I'd have a rate of at least 100 without shadows.
I used batching to reduce this problem, it reduces the number of times a quad must be drawn but I still get noticeable slowdown with many objects. My preferred optimization is if I can somehow draw just one quad to blend all the shadows in one step. Can this be done?
My shadow mapping algorithm on a high level looks like this:
// drawing shadow map
set render target 1
assign common shader parameters
for each shadow caster
- get the bounding box for this object
- compute bounding frustum to fit this box, oriented from the angle of the directional light
- restrict viewport to render to a part of the texture
- draw the shadow caster
end for each
// drawing occlusion of shadows
set render target 2
assign common shader parameters
use additive blending state
set batch count to 0
for each shadow caster
- get the bounding box for this object
- compute bounding frustum to get view matrix from the light
- add view matrix to a matrix array
- calculate texture mapping offset for this object and add this to another array
- increment batch count by 1
- if batch count == maximum batch size
--- set matrix and texture offset parameters to the shader
--- apply pass and render a full-screen quad
--- reset batch count to 0
end for each
if batch count > 0 (we have a remainder of objects)
- apply the same steps used as in if batch count == max batch size
I would really prefer to render all the shadows with just one quad, and know for certain that the main bottleneck is in drawing too many screen-spaced quads, because as a test I made the quads 1/5 of the full screen size and the framerate jumped back up. Z-fighting is not the issue, as giving each quad a different Z-depth produced the same results in performance.