So finally a video showing the multiple lights and shadows in motion. It's a lot smoother here, maintaining 60 fps effortlessly on my machine. My keyboard response code seems to break down a bit when recording with CamStudio for some reason so the video doesn't look quite so smooth.
I'll break down the rendering stages in this entry.
First a quick tale about why one should really start using [font='courier new']nullptr[/font] over a literal zero when assigning null defaults to pointers.
[font='courier new']Pc[/font] is the name of the class I have representing the player character, which is currently the red capsules you see in the screenshots and videos. It is derived from [font='courier new']Entity[/font] and the entities are stored and owned by the [font='courier new']LevelModel[/font], but I also needed to keep a (non-owning) pointer to one in my [font='courier new']GameMode[/font] to update the light position and so on.
Originally I had one [font='courier new']Pc *pc[/font] pointer as a member of [font='courier new']GameMode[/font], which I initialised to 0 in the [font='courier new']GameMode[/font] constructor:
GameMode::GameMode(Gx::Graphics &graphics, Gx::ApplicationEvents &events) : graphics(graphics), events(events), pc(0){ receiver.connect(events.keyDown, this, &GameMode::keyDown);}
When I wanted to start mesisng with multiple lights, I changed [font='courier new']Pc *pc[/font] in [font='courier new']GameMode[/font] header to [font='courier new']Gx::PodVector pc[/font].
Everything compiled and apparently worked okay, then I noticed that my game was crashing on shutdown, and very occasionally if I reloaded the level. After much gnashing of teeth, I discovered if I swapped out the [font='courier new']Gx::PodVector[/font] for [font='courier new']std::vector[/font], the issue went away. But I have been using the code upon which [font='courier new']Gx::PodVector[/font] is based for literally years without any issues of this sort.
Turns out that [font='courier new']GameMode[/font]'s constructor still had [font='courier new']pc(0)[/font] in it...
Now, one of the many constructors that [font='courier new']Gx::PodVector[/font] takes is one that defines the initial size/capacity. This value is also then used as the chunk size when growing:
template class PodVector{public: // snip PodVector(size_type s) : r(s, s) { } // snipprivate: struct rep { rep(size_type r, size_type n) : chunk(r) { front = new T[r]; back = (front + n) - 1; cap = (front + r) - 1; } };};
Now, of course, because I used 0 instead of [font='courier new']nullptr[/font] in [font='courier new']GameMode[/font]'s constructor, I was creating a [font='courier new']PodVector[/font] that calls [font='courier new']new T[0][/font]. Whoops. And in great C++ fashion, this was silently working, stomping all over unallocated memory, and not crashing until the program actually shut down.
Had I used [font='courier new']pc(nullptr)[/font] in the [font='courier new']GameMode[/font] constructor, as soon as I turned the pointer into a [font='courier new']PodVector[/font], the compiler would have caught the error and saved me several hours of frustration :)
So the rendering process for this is as follows. Nothing original going on here I'd stress, all just standard techniques.
My system works based on a [font='courier new']Scene[/font] class that maintains (but does not own) a list of pointers to [font='courier new']SceneNodes[/font], the only one of which at the moment is a [font='courier new']StaticMeshNode[/font].
We have [font='courier new']RenderPass[/font] and [font='courier new']RenderType[/font] defined as follows (ignoring [font='courier new']Particle[/font] for now):
enum class RenderPass { Base, Lit, Depth };enum class RenderType { General, Particle, Null };
[font='courier new']SceneNode[/font] has the following interface:
class SceneNode{public: SceneNode(); virtual ~SceneNode(); virtual bool pass(RenderPass type) const = 0; virtual RenderType type() const = 0; virtual void render(RenderPass pass, Gx::Graphics &graphics, const SceneParams ¶ms, float blend) const = 0; Gx::Signal destroyed;};
[font='courier new']SceneParams[/font] is just a small structure we use to pass the parameters around more easily and will make future expansion easier on the code. Barely worth sharing at the moment:
class SceneParams{public: SceneParams(){ } Gx::Matrix view; Gx::Matrix proj; Gx::Vec3 shadowLightPos; Gx::Vec4 lightColor;};
So a node can tell the Scene which passes it should be involved in, which [font='courier new']RenderType[/font] it uses (just [font='courier new']General[/font] used at the moment) and can be told to render itself, which for [font='courier new']StaticMeshNode[/font] is as simple as:
void StaticMeshNode::render(RenderPass pass, Gx::Graphics &graphics, const SceneParams ¶ms, float blend) const{ graphics.device.vertexShader().setMatrix(graphics.device, "world", world); graphics.device.renderTriangleList(graphics.resources.get(mesh));}
[font='courier new']Scene[/font] then uses the following loop when you call its render() method with a given [font='courier new']RenderPass:[/font]
void Scene::render(RenderPass pass, Gx::Graphics &graphics, const SceneParams ¶ms, float blend){ RenderType curr = RenderType::Null; for(auto i: nodes) { if(i->pass(pass)) { RenderType type = i->type(); if(curr != type) { endType(graphics); curr = type; beginType(pass, type, graphics, params); } if(curr != RenderType::Null) { i->render(pass, graphics, params, blend); } } } endType(graphics);}
[font='courier new']beginType()[/font] and [font='courier new']endType()[/font] just set up the global states, shaders, blend states etc for each render type. It is currently up to the external owner of [font='courier new']Scene[/font] to ensure the [font='courier new']SceneNodes[/font] are ordered in a way that makes this efficient.
So with all this in mind, the process to render multiple lights and shadows is as follows:
1) Disable color writes, do a depth-only render of the scene from the camera's perspective to get the Z buffer set up correctly.
This also has the benefit that subsequent renders to the frame buffer will only run the pixel shader on visible pixels. This uses the simplest possible vertex shader and no pixel shader:
matrix world;matrix viewproj;struct VS_INPUT{ vector position : POSITION;};struct VS_OUTPUT{ vector position : POSITION;};VS_OUTPUT main(VS_INPUT input){ VS_OUTPUT output = (VS_OUTPUT)0; output.position = mul(input.position, mul(world, viewproj)); return output;}
2) For each light source:
2a) Get hold of the pre-allocated cube texture (512x512x512 D3DFMT_R32F) and, for each of the six faces, set the face as the render target, set a pre-allocated 512x512 depth/stencil surface, set up the correct 90 degree view matrix and render the scene.
void GameMode::renderShadowCube(const Gx::Vec3 &lightPos, float blend){ Gx::CubeMap &map = graphics.resources.get("shadowcube"); Gx::DepthStencilSurface &depthStencil = graphics.resources.get("shadowcubedepthstencil"); Gx::Vec3 look[6] = { Gx::Vec3(1, 0, 0), Gx::Vec3(-1, 0, 0), Gx::Vec3(0, 1, 0), Gx::Vec3(0, -1, 0), Gx::Vec3(0, 0, 1), Gx::Vec3(0, 0, -1) }; Gx::Vec3 up[6] = { Gx::Vec3(0, 1, 0), Gx::Vec3(0, 1, 0), Gx::Vec3(0, 0, -1), Gx::Vec3(0, 0, 1), Gx::Vec3(0, 1, 0), Gx::Vec3(0, 1, 0) }; SceneParams params; params.proj = Gx::perspectiveMatrix(Gx::Size(map.dimension(), map.dimension()), M_PI / 2, 0.1f, 200); params.shadowLightPos = lightPos; Gx::RenderContext old(graphics.device); for(int i = D3DCUBEMAP_FACE_POSITIVE_X; i <= D3DCUBEMAP_FACE_NEGATIVE_Z; ++i) { params.view = Gx::lookAtMatrix(lightPos, lightPos + look, up); Gx::RenderContext cube(map, static_cast(i), depthStencil); cube.apply(graphics.device); graphics.device.clear(Gx::Color(0, 1, 0, 1), 1); scene.render(RenderPass::Depth, graphics, params, blend); } old.apply(graphics.device);}
This uses the following vertex and pixel shaders:
matrix world;matrix viewproj;struct VS_INPUT{ vector position : POSITION;};struct VS_OUTPUT{ vector position : POSITION; vector worldpos : TEXCOORD;};VS_OUTPUT main(VS_INPUT input){ VS_OUTPUT output = (VS_OUTPUT)0; output.position = mul(input.position, mul(world, viewproj)); output.worldpos = mul(input.position, world); return output;}
float3 shadowLight;struct PS_INPUT{ vector worldpos : TEXCOORD;};struct PS_OUTPUT{ vector diffuse : COLOR;};PS_OUTPUT main(PS_INPUT input){ PS_OUTPUT output = (PS_OUTPUT)0; float dist = length(((float3)input.worldpos) - shadowLight); output.diffuse = vector(dist / 200, 0, 0, 1); return output;}
2b) Render the scene from the camera's perspective, with the cube texture from 2a) set as input to the pixel shader. Enable alphablending and set the blend states to [font='courier new']D3DBLEND_SRCCOLOR[/font], [font='courier new']D3DBLEND_ONE[/font]. This uses the following vertex and pixel shaders - the pixel shader is the most complex in this system.
matrix world;matrix viewproj;struct VS_INPUT{ vector position : POSITION; float3 normal : NORMAL; vector diffuse : COLOR;};struct VS_OUTPUT{ vector position : POSITION; vector diffuse : COLOR; float3 normal : TEXCOORD0; vector worldpos : TEXCOORD1;};VS_OUTPUT main(VS_INPUT input){ VS_OUTPUT output = (VS_OUTPUT)0; output.position = mul(input.position, mul(world, viewproj)); output.normal = mul(input.normal, (float3x3)world); output.diffuse = input.diffuse; output.worldpos = mul(input.position, world); return output;}
sampler ds : register(s0);float3 shadowLight;vector lightColor;struct PS_INPUT{ vector diffuse : COLOR; float3 normal : TEXCOORD0; float3 worldpos : TEXCOORD1;};struct PS_OUTPUT{ vector diffuse : COLOR;};float shadowDarkness = 0.5f;float shadowFac(float3 tolight){ float sampled = texCUBE(ds, tolight).r; float actual = (length(tolight) - 0.1f) / 200; if(sampled < actual) return shadowDarkness; return 0;}PS_OUTPUT main(PS_INPUT input){ PS_OUTPUT output = (PS_OUTPUT)0; float3 n = normalize(input.normal); float3 toShadowLight = ((float3)input.worldpos + (n * 0.08f)) - shadowLight; float shadow = (dot(toShadowLight, n) < 0) ? shadowFac(toShadowLight) : shadowDarkness; float s = dot(n, normalize(-toShadowLight)); if(s < 0.3f) s = 0.3f; float d = length(toShadowLight); s *= 1 - saturate(d / 60); output.diffuse = input.diffuse * lightColor * s * (1 - shadow); output.diffuse.a = 1; return output;}
Speaks for itself really, entirely standard and not yet optimised. For example, I could store and compare the squared distance rather than the distance easily enough. But I wanted it working in a non-optimised fashion first so I can get some real world frame timing that I can then compare to when I start optimising as I'm doubtful it will make a lot of difference.
The radius of each light (60) is just hardcoded at the moment. Obviously this can be a new input parameter eventually.
So far, my impression is that the main overhead in this system is the fill rate on the texture cubes. This is evidenced by how much faster three lights with 512x512x512 cubes are than with 1024x1024x1024 cubes.
I'm also applying no PCF or similar to the shadows at the moment. Wanted to get some real frame timing before I added this in to get a realistic idea of the cost.
So the main [font='courier new']GameMode::render()[/font] looks like ([font='courier new']Base[/font] is the [font='courier new']RenderPass[/font] for the depth-only render to the frame buffer):
void GameMode::render(float blend){ SceneParams params = sceneParams(blend); model.prepareScene(params, blend); graphics.device.clear(Gx::Color(0, 0, 0, 0), 1.0f); scene.render(RenderPass::Base, graphics, params, blend); for(Gx::Index i = 0; i < pc.size(); ++i) { params.shadowLightPos = pc->position(blend) + Gx::Vec3(0, 2, 0); params.lightColor = Gx::Vec4(1, 1, 1, 1); renderShadowCube(params.shadowLightPos, blend); scene.render(RenderPass::Lit, graphics, params, blend); }}
So there you have it. Thanks for stopping by.
[EDIT] All fixed, thanks guys.