Sign in to follow this  
matt77hias

Shadow Mapping

Recommended Posts

matt77hias    436

I currently use three structured buffers for each light type (directional, omni and spot) and a constant buffer for light-related data (ambient, fog, number of each light type). In order to support shadow mapping, I want to add three additional structured buffers, one for each light type but now for shadow mapping. The light structures for shadow mapping are pretty much the same: one camera-view-to-light-projection (directional and spot) or camera-view-to-light-view (omni) transformation matrix is added. Furthermore, I added two Texture2D arrays (directional and spot) and one TextureCubeArray (omni) for the shadow maps. That way, all lighting can be done (excl. the generation of the shadow maps) in a single pass and there is no limit on the number of lights each (except the physical limits of the GPU). Furthermore, tiled and clustered shading are quite trivial to add in the future.

I wonder, however, how many shadow maps are typically used? One omni-light already results in the generation of six shadow maps, each resulting from a separate depth pass, which seems huge, even in the presence of some CPU-side culling (for this reason, I don't want to use the GS)? If you are not be carefull with the fall-of distance, the depth passes of a single omni-light can become a bottleneck?

Does one normally support multiple formats (16bit vs. 32bit) and multiple resolutions?

Edited by matt77hias

Share this post


Link to post
Share on other sites
MJP    19786

I'm not sure what typical numbers would be for games, since I would imagine that it can vary quite a bit depending on the type of game (indoor vs. outdoor) and the target hardware. I can tell you that for the last 2 games that I worked on (The Order: 1886 and Lone Echo) we capped the engine at shadow-casting 1 directional light (with up to 4 cascades) + 16 shadow-casting spot lights. The directional light used 16-bit depth maps, while the spot lights used 32-bit depth maps. The spot lights were cached across frames if nothing changed or if they were lower-priority, so we didn't necessarily re-render the full set every frame. We also fade out the shadow maps into pre-baked shadows as the light gets further away, which gives the illusion of having more than 16 shadow-casting lights. Our artists can pick a resolution per-spotlight using a scale factor, which is just used as a viewport scale when rendering to the depth map. It's also baked into the light's shadow matrix so that it's handled when the shadow map is sampled during the lighting phase. The base resolution of the shadow maps were driven by engine configuration variables, which were hard-coded for The Order but for Lone Echo were tied to user-driven graphics settings in the menu.

We didn't support omni shadows for either game, but if we did I would probably set it up so that they have to take 6 shadow maps from the spot lights so that we could keep a similar performance/memory footprint. But you should be sure to cull each face of the omni shadow map individually, since they may not all be within the primary view frustum simultaneously.

Share this post


Link to post
Share on other sites
matt77hias    436
11 minutes ago, MJP said:

But you should be sure to cull each face of the omni shadow map individually, since they may not all be within the primary view frustum simultaneously.

Nice suggestion, more aggressive than complete light culling.

11 minutes ago, MJP said:

The directional light used 16-bit depth maps

Why using the lowest depth, for the largest range light?

11 minutes ago, MJP said:

1 directional light (with up to 4 cascades) + 16 shadow-casting spot lights

So in the extreme case, you will have 17 depth passes for shadow mapping alone?

Edited by matt77hias

Share this post


Link to post
Share on other sites
MJP    19786

16-bit depth maps are a common optimization for directional lights because they use an orthographic projection, which means that the resulting depth range is linear (as opposed to the exponential distribution that you get from a perspective projection). When you're rendering to 4 2k depth maps in a frame, cutting the bandwidth in half can make a pretty serious difference in your performance. :)

The worst case is actually more like 20 depth passes, because of the 4 cascades from the directional light. The upside of lots of depth passes is that it gives you some time to do some things with async compute.

 

Share this post


Link to post
Share on other sites
matt77hias    436
15 hours ago, L. Spiro said:

Because, as mentioned, scenes can vary widely, the common way to decide how many shadows you will have is derived from a performance goal on a specific target spec.  In other words, determine how many shadows you can have while maintaining X FPS on Y hardware.

The reason you should be using an algorithm like this to determine your own metrics is because not only do different scenes in different games come with different performance compromises, your own implementation of shadows may perform very differently from others'.  Your question is useful for allowing you to consider how optimized shadow maps must be in other games and for you to consider how much you have to do to get there, but if you were asked right now by a boss to estimate how many shadows you can use you would use the above-mentioned process.

 

To give you actual stats and an idea of the optimizations used, here is what I did on Final Fantasy XV.

We had a basic implementation likely matching what you have, with cube textures for point lights and different textures for the rest (4 textures for a cascaded directional light and X spot lights).  The first thing I did was improve the culling on the cascaded directional light so that the same objects from the nearest cascade were not being needlessly drawn into the farther cascades.  If you aren't doing this, it can lead to huge savings as you can avoid having your main detailed characters being redrawn, complete with re-skinning etc.

Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture.  So a 512-by-512 cube texture became a single 512-by-3,072 texture.  Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas.

Now that all shadows were being drawn to 2D textures, I created a texture atlas for all the shadows except the cascaded ones.  A single large texture for all the point and spot lights.  It was 2,048-by-2,048 first but could grow to 4,096-by-2,048 if necessary.  Putting at the point and spot shadows into a single texture was a huge gain for many reasons, but one of main gains was that we had access to all the shadows during a single lighting pass, which meant we could draw all the shadows in a single pass instead of many.

 

At this point our limit was simply how many shadows could be drawn until the texture atlas got filled, sorted by priority largely based on distance.  As mentioned by MJP, an important aspect of this is to cull all six faces of a point-light shadow.  Any shadow frustums not in view meant less time creating shadow maps and more room for other shadows in the atlas.

Next, I wanted the shadow maps to have LOD, as the smaller shadow sizes would allow faster creation, and smaller shadow maps meant more shadows could fit into the atlas.

Each shadow frustum (up to 6 for point lights and 1 for each spot light, where each shadow frustum at least partially intersects the camera frustum—any shadow frustums fully outside the view frustum would be discarded prior to this step) was projected onto a small in-memory representation of a screen and clipped by the virtual screen edges.

This sounds complicated but it is really simple.  The camera's world-view matrix translates points into a [-1,-1]...[1,1] space on your screen, so we simply used that same matrix to transform the shadow frustum points, then clipped anything beyond -1 and 1 in both directions.

gl_frustumclip.png

Now with the outline of the clipped shadow frustum in -1...1 space, taking the area of the created shape gives you double the percentage of the screen it covers (represented as  0=0% to 2=100%).

In short, we measured how much each shadow frustum is in view of the camera.  Based on this percentage, I would drop the shadow resolution by half, or half again if even less was in view, etc.  I believe I put a limit at 64-by-64.

If you play Final Fantasy XV, you can see this in action if you know where to look.  If you slowly move so that a shadow from a point light takes less and less screen space you might be able to see the resolution drop.

 

Now with the shadow-map LOD system, most shadows are drawn at a lower resolution, only going full-size when you get near and are looking directly at the shadowed area.  Because this actually affects so many shadows, the savings are significant.  If you decide to keep the same limit on shadows as you had before you will find a huge gain in performance.  In our case, we continued allowing the shadow atlas to be filled, so we were able to support double or more shadows with the same performance.

 

Another important optimization is to render static objects to offline shadow maps.  A tool generates the shadow maps offline, rendering only static objects (buildings, lamp posts, etc.) into them.  At run-time, you create the final shadow map by copying the static shadow map over it and then rendering your dynamic objects (characters, foliage, etc.) into it.

This is a major performance improvement again.  We already had this for Final Fantasy XV, but since I added the shadow LOD system I had to make the offline static shadows carry mipmaps.  It is important to note that the shadow mipmaps are not a downsampling of mip level 0—you have to re-render the scene into each mipmap, again with some lower limit such as 64-by-64.

 

All of this together allowed us probably around 30 shadow maps with the ability to dynamically scale with the scene and without too many restrictions on the artists.  Shadow maps were sorted by a priority system so that by the time the shadow atlas was filled, the shadows that had to be culled were distant, off-to-the-side, or otherwise unimportant.


L. Spiro

Thanks. Really interesting optimizations.

If you use the same resolutions (ignoring LOD), you can use arrays of texture and cube maps as well while still having a single lighting pass?

Edited by matt77hias

Share this post


Link to post
Share on other sites
turanszkij    366
14 hours ago, L. Spiro said:

Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture.  So a 512-by-512 cube texture became a single 512-by-3,072 texture.  Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas.

This is interesting, I've seen other games do this. A few questions come to mind, like wouldn't sampling a cubemap texture be more efficient? I heard that AMD GCN already does the cubemap sampling as regular tex2d samples, but is that the same with NVidia? Is there some open-source implementation of that sampling?

And what do you mean by advantages in caching, clearing, filling and filtering? I don't see any such trivial upsides to this (apart from the dynamic shadow resulotion you mentioned later), but the complexity of the implementation definetly increases.

Share this post


Link to post
Share on other sites
JoeJ    2586
26 minutes ago, turanszkij said:

This is interesting, I've seen other games do this. A few questions come to mind, like wouldn't sampling a cubemap texture be more efficient? I heard that AMD GCN already does the cubemap sampling as regular tex2d samples, but is that the same with NVidia? Is there some open-source implementation of that sampling?

What if you increase fov for each face by a small amount so you get one 'overlaping' border of texels? You would need to sample always just one shadow map, and i assume artefacts at cube corners / edges would be negligible for shadows?

 

 

Share this post


Link to post
Share on other sites
turanszkij    366
28 minutes ago, JoeJ said:

What if you increase fov for each face by a small amount so you get one 'overlaping' border of texels? You would need to sample always just one shadow map, and i assume artefacts at cube corners / edges would be negligible for shadows?

Good idea, but what about unconnected faces? I imagine there would still be a bunch of branches to determine where we should sample in that case.

Share this post


Link to post
Share on other sites
JoeJ    2586
45 minutes ago, turanszkij said:

but what about unconnected faces?

What faces do you mean? Geometry? Or related to the 6 projections?

Share this post


Link to post
Share on other sites
JoeJ    2586

But that's the point of my suggestion: You never get closer to the texture edge than 0.5 texels, so you don't need to worry about disconnected UV space. Of course you still need to select the proper UV offset to adress the atlas, but only once and not 3 times, and this should be very cheap anyways and can be made branchless.

The only question is how bad artefacts are, and how this depends on shadow technique (PCF, VSM, ...)

Share this post


Link to post
Share on other sites
L. Spiro    25638
10 hours ago, matt77hias said:

If you use the same resolutions (ignoring LOD), you can use arrays of texture and cube maps as well while still having a single lighting pass?

Yes, but unless you pass extra parameters that means all of your shadows have to have the same resolution.

 

5 hours ago, turanszkij said:

A few questions come to mind, like wouldn't sampling a cubemap texture be more efficient? I heard that AMD GCN already does the cubemap sampling as regular tex2d samples, but is that the same with NVidia? Is there some open-source implementation of that sampling?

I don’t think NVIDIA is different.  In either case, sampling a cube map actually emits a series of intrinsics that give the face index and 2D coordinates.  Since consoles expose these intrinsics, my routines for Xbox One and PlayStation 4 are instruction-for-instruction exactly the same as a cube sample, except for one extra instruction to increase my Y coordinate based off the face index.  My routine for Windows can’t use the intrinsics but should compile to the same thing.  I don’t know of any open-source implementations as mine are derived from looking at shader assembly.

 

5 hours ago, turanszkij said:

And what do you mean by advantages in caching, clearing, filling and filtering? I don't see any such trivial upsides to this (apart from the dynamic shadow resulotion you mentioned later), but the complexity of the implementation definetly increases.

Clearing can be done with a single call, which is a win on any platform that clears by just setting a flag, where the time is dominated by jumping back and forth between the driver and user code, etc.  Less of a win for platforms that modify each pixel, but still a slight win.

Filling requires no render-target swaps.

Filtering becomes a win because you can easily use any shadow filtering you wish.  As mentioned by JoeJ, you widen the projection for each cube face by a specified amount of pixels, so for example if you have a 512×512 texture and you want to widen the projection by exactly 3 pixels, your field-of-view will be 90.33473583181500191937274374069° instead of 90°.

Now you have 3 border pixels to sample for any kind of filtering you wish to use with no complicated math to sample across faces etc.  This also allows all of your shadows to have a unified look, as you will no longer have to use one filter for spot lights and a simpler one for point lights.

 

3 hours ago, JoeJ said:

The only question is how bad artefacts are

You can see for yourself by looking at Final Fantasy XV.  There aren’t any artifacts, as any seams at the edges of the faces of the point-light shadow map would be a deal-breaker.  Since all faces will have the same amount of over-projection (not considering LOD resizes) they still align with each other seamlessly.

And as you mentioned factors to adjust the UV coordinates to account for this, keep in mind that once you have a shadow atlas you already have these factors for reading shadows, so you aren’t creating any extra work by adding these borders.  You simply adjust the same offset/scale factors that you were already using for reading each shadow from the atlas.

 

Just remember that you have to use the widened frustum for culling the objects for each face of the point light, and to account for the border when generating offline static shadow maps.


L. Spiro

Edited by L. Spiro

Share this post


Link to post
Share on other sites
JoeJ    2586

Nice you already have experience with all kinds of shadows optimizations i've in mind :) But there are two more...

21 hours ago, L. Spiro said:

Another important optimization is to render static objects to offline shadow maps.  A tool generates the shadow maps offline, rendering only static objects (buildings, lamp posts, etc.) into them.

Does this make sense also for the scrolling cascades of directional sun with some kind of streaming?

 

The second question is about updating shadowmaps at lower frequency, say after 4 frames. I assume the easiest way to achieve this would  be to transform the sample point back in time for dynamic objects. Anyone tried something similar already?

 

 

 

 

Share this post


Link to post
Share on other sites
L. Spiro    25638
17 minutes ago, JoeJ said:

Does this make sense also for the scrolling cascades of directional sun with some kind of streaming?

You can only pre-bake a shadow map if neither the light nor any objects in the shadow map are moving.  Any kind of moving sun is out.
Even if your sun is static, you could only bake the cascades if they cover a specific area and never move (with the player, based on where the player looks, etc.)  This largely defeats the purpose of cascades, and I can’t think of any cases where it would be useful to bake them.

17 minutes ago, JoeJ said:

The second question is about updating shadowmaps at lower frequency, say after 4 frames. I assume the easiest way to achieve this would  be to transform the sample point back in time for dynamic objects. Anyone tried something similar already?

I can’t remember the site, company, or game, but around 7 years ago this idea was published and explained with a demo.  You reproject the shadow look-up coordinates in much the same way as you have to reproject the scene for temporal anti-aliasing.  It hasn’t caught on since then due to the difficulty in implementing it and all the edge cases that arise that either slow down the routine if handled fully or lead to artifacts if not.  As with any reprojection technique, you have problems with things appearing from out-of-view.

But it might be suitable for the farthest cascade of a shadow map.  We were considering this but did not implement it.  If your world is as large as ours in Final Fantasy XV, then your farthest cascade will be mostly blurry and you can get away with not rendering certain things such as small foliage (another optimization I implemented), so if there is a candidate for this type of reduced updating rates it would be that.


L. Spiro

Edited by L. Spiro

Share this post


Link to post
Share on other sites
turanszkij    366
On ‎2017‎. ‎09‎. ‎14‎. at 5:18 PM, JoeJ said:

But that's the point of my suggestion: You never get closer to the texture edge than 0.5 texels, so you don't need to worry about disconnected UV space. Of course you still need to select the proper UV offset to adress the atlas, but only once and not 3 times, and this should be very cheap anyways and can be made branchless.

 

22 hours ago, L. Spiro said:

Filtering becomes a win because you can easily use any shadow filtering you wish.  As mentioned by JoeJ, you widen the projection for each cube face by a specified amount of pixels, so for example if you have a 512×512 texture and you want to widen the projection by exactly 3 pixels, your field-of-view will be 90.33473583181500191937274374069° instead of 90°.

Thank you guys, very interesting, now I have to try it! :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this