• # Effect: Area Light Shadows Part 1: PCSS

Graphics and GPU Programming

Welcome to the first part of multiple effect articles about soft shadows. In recent days I've been working on area light support in my own game engine, which is critical for one of the game concepts I'd like to eventually do (if time will allow me to do so). For each area light, it is crucial to have proper soft shadows with proper penumbra. For motivation, let's have the following screenshot with 3 area lights with various sizes:

Fig. 01 - PCSS variant that allows for perfectly smooth, large-area light shadows

Let's start the article by comparison of the following 2 screenshots - one with shadows and one without:

Fig. 02 - Scene from default viewpoint lit with light without any shadows (left) and with shadows (right)

This is the scene we're going to work with, and for the sake of simplicity, all of the comparison screenshots will be from this exact same viewpoint with 2 different scene configurations. Let's start with the definition of how shadows are created. Given a scene and light which we're viewing. Shadow umbra will be present at each position where there is no direct visibility between given position and any existing point on the light. Shadow penumbra will be present at each position where there is visibility of any point on the light, yet not all of them. No shadow is everywhere where there is full direct visibility between each point on the light and position.

Most of the games tend to simplify, instead of defining a light as area or volume, it gets defined as an infinitely small point, this gives us few advantages:

• For single point, it is possible to define visibility in a binary way - either in shadow or not in shadow
• From single point, a projection of the scene can be easily constructed in such way, that definition of shadow becomes trivial (either position is occluded by other objects in the scene from lights point of view, or it isn't)

From here, one can follow into the idea of shadow mapping - which is a basic technique for all others used here.

Trivial, yet should be mentioned here.

inline float ShadowMap(Texture2D<float2> shadowMap, SamplerState shadowSamplerState, float3 coord)
{
}

Fig. 03 - code snippet for standard shadow mapping, where depth map (stored 'distance' from lights point of view) is compared against calculated 'distance' between point we're computing right now and given light position. Word 'distance' may either mean actual distance, or more likely just value on z-axis for light point of view basis.

Which is well known to everyone here, giving us basic results, that we all well know, like:

Fig. 04 - Standard Shadow Mapping

This can be simply explained with the following image:

Fig. 05 - Each rendered pixel calculates whether its 'depth' from light point is greater than what is written in 'depth' map from light point (represented as yellow dot), white lines represent computation for each pixel.

# Percentage-Close-Filtering (PCF)

To make shadow more visually appealing, adding soft-edge is a must. This is done by simply performing NxN tests with offsets. For the sake of improved visual quality I've used shadow mapping with bilinear filter (which requires resolving 4 samples), along with 5x5 PCF filtering:

Fig. 06 - Percentage close filtering (PCF) results in nice soft-edged shadows, sadly the shadow is uniformly soft everywhere

Clearly, none of the above techniques does any penumbra/umbra calculation, and therefore they're not really useful for area lights. For the sake of completeness, I'm adding basic PCF source code (for the sake of optimization, feel free to improve for your uses):

inline float ShadowMapPCF(Texture2D<float2> tex, SamplerState state, float3 projCoord, float resolution, float pixelSize, int filterSize)
{
float2 grad = frac(projCoord.xy * resolution + 0.5f);

for (int i = -filterSize; i <= filterSize; i++)
{
for (int j = -filterSize; j <= filterSize; j++)
{
float4 tmp = tex.Gather(state, projCoord.xy + float2(i, j) * float2(pixelSize, pixelSize));
tmp.x = tmp.x < projCoord.z ? 0.0f : 1.0f;
tmp.y = tmp.y < projCoord.z ? 0.0f : 1.0f;
tmp.z = tmp.z < projCoord.z ? 0.0f : 1.0f;
tmp.w = tmp.w < projCoord.z ? 0.0f : 1.0f;
}
}

return shadow / (float)((2 * filterSize + 1) * (2 * filterSize + 1));
}

Fig. 07 - PCF filtering source code

Representing this with image:

Fig. 08 - Image representing PCF, specifically a pixel with straight line and star in the end also calculates shadow in neighboring pixels (e.g. performing additional samples). The resulting shadow is then weighted sum of the results of all the samples for a given pixel.

While the idea is quite basic, it is clear that using larger kernels would end up in slow computation. There are ways how to perform separable filtering of shadow maps using different approach to resolve where the shadow is (Variance Shadow Mapping for example). They do introduce additional problems though.

To understand problem in both previous techniques let's replace point light with area light in our sketch image.

Fig. 09 - Using Area light introduces penumbra and umbra. The size of penumbra is dependent on multiple factors - distance between receiver and light, distance between blocker and light and light size (shape).

To calculate plausible shadows like in the schematic image, we need to calculate distance between receiver and blocker, and distance between receiver and light. PCSS is a 2-pass algorithm that does calculate average blocker distance as the first step - using this value to calculate penumbra size, and then performing some kind of filtering (often PCF, or jittered-PCF for example). In short, PCSS computation will look similar to this:

float ShadowMapPCSS(...)
{
float averageBlockerDistance = PCSS_BlockerDistance(...);

// If there isn't any average blocker distance - it means that there is no blocker at all
if (averageBlockerDistance < 1.0)
{
return 1.0f;
}
else
{
float penumbraSize = estimatePenumbraSize(averageBlockerDistance, ...)
}
}

Fig. 10 - Pseudo-code of PCSS shadow mapping

The first problem is to determine correct average blocker calculation - and as we want to limit search size for average blocker, we simply pass in additional parameter that determines search size. Actual average blocker is calculated by searching shadow map with depth value smaller than of receiver. In my case I used the following estimation of blocker distance:

// Input parameters are:
// tex - Input shadow depth map
// state - Sampler state for shadow depth map
// projCoord - holds projection UV coordinates, and depth for receiver (~further compared against shadow depth map)
// searchUV - input size for blocker search
// rotationTrig - input parameter for random rotation of kernel samples
inline float2 PCSS_BlockerDistance(Texture2D<float2> tex, SamplerState state, float3 projCoord, float searchUV, float2 rotationTrig)
{
// Perform N samples with pre-defined offset and random rotation, scale by input search size
int blockers = 0;
float avgBlocker = 0.0f;
for (int i = 0; i < (int)PCSS_SampleCount; i++)
{
// Calculate sample offset (technically anything can be used here - standard NxN kernel, random samples with scale, etc.)
float2 offset = PCSS_Samples[i] * searchUV;
offset = PCSS_Rotate(offset, rotationTrig);

// Compare given sample depth with receiver depth, if it puts receiver into shadow, this sample is a blocker
float z = tex.SampleLevel(state, projCoord.xy + offset, 0.0f).x;
if (z < projCoord.z)
{
blockers++;
avgBlockerDistance += z;
}
}

// Calculate average blocker depth
avgBlocker /= blockers;

// To solve cases where there are no blockers - we output 2 values - average blocker depth and no. of blockers
return float2(avgBlocker, (float)blockers);
}

Fig. 11 - Average blocker estimation for PCSS shadow mapping

For penumbra size calculation - first - we assume that blocker and receiver are plannar and parallel. This makes actual penumbra size is then based on similar triangles. Determined as:

penmubraSize = lightSize * (receiverDepth - averageBlockerDepth) / averageBlockerDepth

This size is then used as input kernel size for PCF (or similar) filter. In my case I again used rotated kernel samples. Note.: Depending on the samples positioning one can achieve different area light shapes. The result gives quite correct shadows, with the downside of requiring a lot of processing power to do noise-less shadows (a lot of samples) and large kernel sizes (which also requires large blocker search size). Generally this is very good technique for small to mid-sized area lights, yet large-sized area lights will cause problems.

Fig. 12 - PCSS shadow mapping in practice

As currently the article is quite large and describing 2 other techniques which I allow in my current game engine build (first of them is a variant of PCSS that utilizes mip maps and allows for slightly larger light size without impacting the performance that much, and second of them is sort of back-projection technique), I will leave those two for another article which may eventually come out. Anyways allow me to at least show a short video of the first technique in action:

Note: This article was originally published as a blog entry right here at GameDev.net, and has been reproduced here as a featured article with the kind permission of the author.
You might also be interested in our recently featured article on Contact-hardening Soft Shadows Made Fast.

Report Article

## User Feedback

There are no comments to display.

## Create an account

Register a new account

• ### Game Developer Survey

We are looking for qualified game developers to participate in a 10-minute online survey. Qualified participants will be offered a \$15 incentive for your time and insights. Click here to start!

• 0
• 2
• 0
• 4
• 2

• 9
• 56
• 17
• 28
• ### Similar Content

• Hello Forum,
I reached a point in my rework of our build tool where I have to improve parsing different coding languages like C# and C++ to detect dependencies between files and projects. Maybe I will also add some kind of linting to the build tool. Unfortunately I'm fixed to an older version of C# to stay backwards compatible to older Visual Studio versions. This is the reason for not having the chance to use the Roslyn parser framework out of the box and I had to write my own "analyzer" to solve the task.
That I'm now ready for refactoring, I want to make it into a real parser instead of Regex driven analyzer. For this task, I also had a look at parser utility libraries like Sprache and Pidgin to have some more convinience in declaring tokens and rules. I have written parsers on my own by hand many times before and they were always optimized to the special token/ rule and usecase. For example instead of just reading the next string and then comparing it to a keyword, then conditionally reset the stream or pass the token, I wrote something like this
char[] buffer = new buffer[<max keyword length>]; bool MyKeywordRule(Stream data, TokenStream result) { if(data.Peek() != myKeywordFirstChar) return false; long cursor = data.Position; int count = data.Read(buffer, 0, Math.Min(myKeywordLength, data.Length - cursor)); if(count == myKeywordLength && buffer.Compare("keyword", myKeywordLength)) { result.Add(myToken); return true; } else { data.Position = cursor; return false; } } My hand written function is optimized in performance to first test the character, then fill a buffer with data that is as long as the keyword and finally test this for equality with the keyword. If I write this in Sprache or Pidgen, it will just read the stream, compare the result for the keyword and reset on failure, no early out will be performed or whatever speed optimization. If I want to test against a range of keywords or character sequences like for operators, in a generic solution it will read every statement of the sequence from the stream and test it while my hand written solution can fill the buffer once and test the result in a switch statement and early outs for the character count read.
My question is if there exists a solution that can perform both, define rules in a convinient way like Sprache or Pidgin and on the other hand compete somehow with a hand written solution (it must not be as fast but in any way faster than simply replay the parts of a rule). I thought about a solution using C# Expressions where an operation for example my keyword, can be written as a sequence of pre-condition (testing the character), bootstrapping (reading the buffer for certain size) and validation (test if the buffer matches the keyword) and be merged together for example, so that a rule "A or B" dosen't test for a first then resets the stream and tests for B second but merges the pre-condition into

IF not('A') and not('B') FAIL fills the buffer only once and then tests for equality in a switch-case statement. Or is it the best solution to keep writing such code by hand for each individual rule?
• By rogerdv
Im trying to create a system to keep the states of the scenes in an RPG, and my first approach was to have all entities (NPCs, containers, etc) of non active scene in memory, to restore them when its scene gets loaded again. But somebody told me that such idea was not optimal, because of memory consumption, and that the best way was to serialize the scene state to disk when exiting, and restoring from disk when loading again. But that gives me some doubts, because it would require an intensive use of disk (consuming SSD write cycles) and arises the problem of what to do with saved scene states when exiting the game: keep them and reference thos files from global savegame archive (with all possible derived problems), or delete them and saving the state to the saved game. Whats the best choice in this case? I have been looking at the savegames of AAA games and none of them seems to keep any serialized scenes outside the main saved game file.
• By Stocke
I looked into flow field pathfinding(after reading about steering behaviours) and it seems that it's mostly used for RTS games. I understand why, but wouldn't it also work for an action game with a single player character? It wouldn't be efficient, sure but it makes using steering behaviours easier to use. I could precalculate obstacles based on static colliders in the scene. And I can calculate the a vector towards the player for seek behaviours every frame. And reuse that for every enemy.
I could then use inverse of the vectors for flee behaviours and make that avoid obstacles as well.
If I wanted to steer a character perpendicular to the player, maybe I could rotate the vector in the field 90 degrees to obtain that?
Of course this approach would not be valid for other behaviours. Like for pursuit you would need to calculate another vector field for the future position of the player. Unless there is some mathematical way to shift a flow field if the center(where the arrows converge) moves.
I dunno how you would add local avoidance. Maybe keep track of enemy positions on another vector field and take that into consideration.
A* is no doubt efficient and popular. But it does not work well if the agent has target velocities instead of target positions (like from using steering behaviours). And A* also has the problem of being too good(perfect paths => too robotic).
Anyways, that's enough rambling from me. What do you think?
• By Stocke
I am trying to implement obstacle avoidance in unity as follows: I fire a series of parallel raycasts in the players moving direction, get the closest hit and then multiply hit.normal by steering force to get an avoidance vector.Then I use prioritization blending(have some max steering, I add obstacle avoidance steering first and then add seek steering towards target if I have not already reached max steering). The problem is that If the player is directly behind the wall then the seek direction and hit.normal(aka avoidance direction) is in opposite. Hit. normal seems to work if obstacle is far away but not much so if the obstacle is near. Am I supposed to calculate some other way?(normal component of hit.normal on my velocity vector maybe).
And is it better to have  cone raycasts instead of parallel ones?

• Hello! I'm in the middle of a procedural texture system (for things like lightmaps, splatoon-style decal painting etc.) and I have an annoying artefact I don't seem to be able to get rid of.
I've been working through this ( https://www.flipcode.com/archives/Lightmap_Storage.shtml ) method of generating atlas uvs, which mostly works pretty well (here on a capsule):

But under closer inspection has artefacts around the places where the charts meet:

Since this method of uv generation splits tris into charts based on  primary axis, this is where the charts join up. I've drawn out roughly the edges of the three visible charts and you can see how it lines up:

I've got the texture filtering set to 'point' for debugging.
So from what I can see the uvs are correct and the texel edges are all aligned correctly, it's just that the fringe texels from one atlas don't match their counterpart texels in the other charts. I'm calling these 'twinned' texels - they exist twice (or more) in the atlas texture, but to eliminate these artifacts they should be identical colours.

To populate the texture I've got an unwrapped version of the mesh (which basically swaps vertex positions with uvs) which can be rendered into the texture directly. My thinking was that if I divide the world into a 3d grid and have the shader's output colour *only* dependant on the grid position (not the actual fragment world position) then these 'twinned' texels would evaluate to the same colour in all instances, even though it was being generated by different  geometry. Although clearly something's not working.
Shader "Unlit/Atlas" { Properties { _MainTex ("Texture", 2D) = "white" {} _VoxelSize("Voxel Size", Float) = 1.0 _ColourScale("Colour Scale", Float) = 0.1 _ColourOffset("Colour Offset", Vector) = (0,0,0,0) } SubShader { Tags { "RenderType"="Opaque" } Cull Off ZWrite Off ZTest Off LOD 100 Pass { CGPROGRAM #pragma vertex vert #pragma fragment frag #include "UnityCG.cginc" struct appdata { float4 vertex : POSITION; float2 uv0 : TEXCOORD0; float2 uv1 : TEXCOORD1; }; struct v2f { float2 uv0 : TEXCOORD0; float2 uv1 : TEXCOORD1; float2 atlasCoord : TEXCOORD2; float4 vertex : SV_POSITION; }; sampler2D _MainTex; float4 _MainTex_ST; float _VoxelSize; float _ColourScale; float4 _ColourOffset; v2f vert (appdata v) { v2f o; o.vertex = UnityObjectToClipPos(v.vertex); o.uv0 = v.uv0; o.uv1 = v.uv1; o.atlasCoord = v.vertex.xy; return o; } fixed4 frag (v2f i) : SV_Target { fixed4 col = fixed4(0.0, 0.0, 0.0, 1.0); // Extract model's world space coord from two uv channels float3 worldCoord = float3(i.uv0.xy, i.uv1.x); // World coords (in meters) // Snap from world space to containing voxel coord float3 voxelCoord = floor(floor(worldCoord / _VoxelSize)); // Voxel coords (integer values) // Convert voxel xyz into colour range col.xyz = frac((voxelCoord * _ColourScale) + _ColourOffset.xyz); return col; } ENDCG } } } One thing I'm not sure of the proper solution to is making sure all the atlas  texels actually get rendered to. If I just draw the unwrapped tris directly then they don't always touch all the texels they need to around the edges:

So I'm manually extending the edges of the charts with additional fins to make sure these are filled in:

This seems more accurate than just running a 2d dilate operation (since it should go through the deterministic grid method) but maybe I'm missing something. Has anyone any ideas where these artefacts might be coming from and how to make sure these 'twinned' texels are rendered out correctly? Even if I wanted to fudge it in a post-process, I'm drawing a blank as to how to actually figure out which texels would need to be processed this way.
Thanks.
×