Shadow Mapping

560

Author

September 11, 2017 11:38 AM

I currently use three structured buffers for each light type (directional, omni and spot) and a constant buffer for light-related data (ambient, fog, number of each light type). In order to support shadow mapping, I want to add three additional structured buffers, one for each light type but now for shadow mapping. The light structures for shadow mapping are pretty much the same: one camera-view-to-light-projection (directional and spot) or camera-view-to-light-view (omni) transformation matrix is added. Furthermore, I added two Texture2D arrays (directional and spot) and one TextureCubeArray (omni) for the shadow maps. That way, all lighting can be done (excl. the generation of the shadow maps) in a single pass and there is no limit on the number of lights each (except the physical limits of the GPU). Furthermore, tiled and clustered shading are quite trivial to add in the future.

I wonder, however, how many shadow maps are typically used? One omni-light already results in the generation of six shadow maps, each resulting from a separate depth pass, which seems huge, even in the presence of some CPU-side culling (for this reason, I don't want to use the GS)? If you are not be carefull with the fall-of distance, the depth passes of a single omni-light can become a bottleneck?

Does one normally support multiple formats (16bit vs. 32bit) and multiple resolutions?

🧙

MJP

20,295

September 11, 2017 05:41 PM

I'm not sure what typical numbers would be for games, since I would imagine that it can vary quite a bit depending on the type of game (indoor vs. outdoor) and the target hardware. I can tell you that for the last 2 games that I worked on (The Order: 1886 and Lone Echo) we capped the engine at shadow-casting 1 directional light (with up to 4 cascades) + 16 shadow-casting spot lights. The directional light used 16-bit depth maps, while the spot lights used 32-bit depth maps. The spot lights were cached across frames if nothing changed or if they were lower-priority, so we didn't necessarily re-render the full set every frame. We also fade out the shadow maps into pre-baked shadows as the light gets further away, which gives the illusion of having more than 16 shadow-casting lights. Our artists can pick a resolution per-spotlight using a scale factor, which is just used as a viewport scale when rendering to the depth map. It's also baked into the light's shadow matrix so that it's handled when the shadow map is sampled during the lighting phase. The base resolution of the shadow maps were driven by engine configuration variables, which were hard-coded for The Order but for Lone Echo were tied to user-driven graphics settings in the menu.

We didn't support omni shadows for either game, but if we did I would probably set it up so that they have to take 6 shadow maps from the spot lights so that we could keep a similar performance/memory footprint. But you should be sure to cull each face of the omni shadow map individually, since they may not all be within the primary view frustum simultaneously.

The Blog | The Book

matt77hias

560

Author

September 11, 2017 05:49 PM

11 minutes ago, MJP said:

But you should be sure to cull each face of the omni shadow map individually, since they may not all be within the primary view frustum simultaneously.

Nice suggestion, more aggressive than complete light culling.

11 minutes ago, MJP said:

The directional light used 16-bit depth maps

Why using the lowest depth, for the largest range light?

11 minutes ago, MJP said:

1 directional light (with up to 4 cascades) + 16 shadow-casting spot lights

So in the extreme case, you will have 17 depth passes for shadow mapping alone?

🧙

MJP

20,295

September 11, 2017 09:24 PM

16-bit depth maps are a common optimization for directional lights because they use an orthographic projection, which means that the resulting depth range is linear (as opposed to the exponential distribution that you get from a perspective projection). When you're rendering to 4 2k depth maps in a frame, cutting the bandwidth in half can make a pretty serious difference in your performance.

The worst case is actually more like 20 depth passes, because of the 4 cascades from the directional light. The upside of lots of depth passes is that it gives you some time to do some things with async compute.

The Blog | The Book

L. Spiro

25,818

September 13, 2017 10:01 PM

On 9/11/2017 at 4:38 AM, matt77hias said:

I wonder, however, how many shadow maps are typically used? One omni-light already results in the generation of six shadow maps, each resulting from a separate depth pass, which seems huge, even in the presence of some CPU-side culling (for this reason, I don't want to use the GS)? If you are not be carefull with the fall-of distance, the depth passes of a single omni-light can become a bottleneck?

Does one normally support multiple formats (16bit vs. 32bit) and multiple resolutions?

Because, as mentioned, scenes can vary widely, the common way to decide how many shadows you will have is derived from a performance goal on a specific target spec. In other words, determine how many shadows you can have while maintaining X FPS on Y hardware.

The reason you should be using an algorithm like this to determine your own metrics is because not only do different scenes in different games come with different performance compromises, your own implementation of shadows may perform very differently from others'. Your question is useful for allowing you to consider how optimized shadow maps must be in other games and for you to consider how much you have to do to get there, but if you were asked right now by a boss to estimate how many shadows you can use you would use the above-mentioned process.

To give you actual stats and an idea of the optimizations used, here is what I did on Final Fantasy XV.

We had a basic implementation likely matching what you have, with cube textures for point lights and different textures for the rest (4 textures for a cascaded directional light and X spot lights). The first thing I did was improve the culling on the cascaded directional light so that the same objects from the nearest cascade were not being needlessly drawn into the farther cascades. If you aren't doing this, it can lead to huge savings as you can avoid having your main detailed characters being redrawn, complete with re-skinning etc.

Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture. So a 512-by-512 cube texture became a single 512-by-3,072 texture. Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas.

Now that all shadows were being drawn to 2D textures, I created a texture atlas for all the shadows except the cascaded ones. A single large texture for all the point and spot lights. It was 2,048-by-2,048 first but could grow to 4,096-by-2,048 if necessary. Putting all the point and spot shadows into a single texture was a huge gain for many reasons, but one of main gains was that we had access to all the shadows during a single lighting pass, which meant we could draw all the shadows in a single pass instead of many.

At this point our limit was simply how many shadows could be drawn until the texture atlas got filled, sorted by priority largely based on distance. As mentioned by MJP, an important aspect of this is to cull all six faces of a point-light shadow. Any shadow frustums not in view meant less time creating shadow maps and more room for other shadows in the atlas.

Next, I wanted the shadow maps to have LOD, as the smaller shadow sizes would allow faster creation, and smaller shadow maps meant more shadows could fit into the atlas.

Each shadow frustum (up to 6 for point lights and 1 for each spot light, where each shadow frustum at least partially intersects the camera frustum—any shadow frustums fully outside the view frustum would be discarded prior to this step) was projected onto a small in-memory representation of a screen and clipped by the virtual screen edges.

This sounds complicated but it is really simple. The camera's world-view matrix translates points into a [-1,-1]...[1,1] space on your screen, so we simply used that same matrix to transform the shadow frustum points, then clipped anything beyond -1 and 1 in both directions.

Now with the outline of the clipped shadow frustum in -1...1 space, taking the area of the created shape gives you double the percentage of the screen it covers (represented as 0=0% to 2=100%).

In short, we measured how much each shadow frustum is in view of the camera. Based on this percentage, I would drop the shadow resolution by half, or half again if even less was in view, etc. I believe I put a limit at 64-by-64.

If you play Final Fantasy XV, you can see this in action if you know where to look. If you slowly move so that a shadow from a point light takes less and less screen space you might be able to see the resolution drop.

Now with the shadow-map LOD system, most shadows are drawn at a lower resolution, only going full-size when you get near and are looking directly at the shadowed area. Because this actually affects so many shadows, the savings are significant. If you decide to keep the same limit on shadows as you had before you will find a huge gain in performance. In our case, we continued allowing the shadow atlas to be filled, so we were able to support double or more shadows with the same performance.

Another important optimization is to render static objects to offline shadow maps. A tool generates the shadow maps offline, rendering only static objects (buildings, lamp posts, etc.) into them. At run-time, you create the final shadow map by copying the static shadow map over it and then rendering your dynamic objects (characters, foliage, etc.) into it.

This is a major performance improvement again. We already had this for Final Fantasy XV, but since I added the shadow LOD system I had to make the offline static shadows carry mipmaps. It is important to note that the shadow mipmaps are not a downsampling of mip level 0—you have to re-render the scene into each mipmap, again with some lower limit such as 64-by-64.

All of this together allowed us probably around 30 shadow maps with the ability to dynamically scale with the scene and without too many restrictions on the artists. Shadow maps were sorted by a priority system so that by the time the shadow atlas was filled, the shadows that had to be culled were distant, off-to-the-side, or otherwise unimportant.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

matt77hias

560

Author

September 14, 2017 08:30 AM

15 hours ago, L. Spiro said:

Because, as mentioned, scenes can vary widely, the common way to decide how many shadows you will have is derived from a performance goal on a specific target spec. In other words, determine how many shadows you can have while maintaining X FPS on Y hardware.

The reason you should be using an algorithm like this to determine your own metrics is because not only do different scenes in different games come with different performance compromises, your own implementation of shadows may perform very differently from others'. Your question is useful for allowing you to consider how optimized shadow maps must be in other games and for you to consider how much you have to do to get there, but if you were asked right now by a boss to estimate how many shadows you can use you would use the above-mentioned process.

To give you actual stats and an idea of the optimizations used, here is what I did on Final Fantasy XV.

We had a basic implementation likely matching what you have, with cube textures for point lights and different textures for the rest (4 textures for a cascaded directional light and X spot lights). The first thing I did was improve the culling on the cascaded directional light so that the same objects from the nearest cascade were not being needlessly drawn into the farther cascades. If you aren't doing this, it can lead to huge savings as you can avoid having your main detailed characters being redrawn, complete with re-skinning etc.

Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture. So a 512-by-512 cube texture became a single 512-by-3,072 texture. Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas.

Now that all shadows were being drawn to 2D textures, I created a texture atlas for all the shadows except the cascaded ones. A single large texture for all the point and spot lights. It was 2,048-by-2,048 first but could grow to 4,096-by-2,048 if necessary. Putting at the point and spot shadows into a single texture was a huge gain for many reasons, but one of main gains was that we had access to all the shadows during a single lighting pass, which meant we could draw all the shadows in a single pass instead of many.

At this point our limit was simply how many shadows could be drawn until the texture atlas got filled, sorted by priority largely based on distance. As mentioned by MJP, an important aspect of this is to cull all six faces of a point-light shadow. Any shadow frustums not in view meant less time creating shadow maps and more room for other shadows in the atlas.

Next, I wanted the shadow maps to have LOD, as the smaller shadow sizes would allow faster creation, and smaller shadow maps meant more shadows could fit into the atlas.

Each shadow frustum (up to 6 for point lights and 1 for each spot light, where each shadow frustum at least partially intersects the camera frustum—any shadow frustums fully outside the view frustum would be discarded prior to this step) was projected onto a small in-memory representation of a screen and clipped by the virtual screen edges.

This sounds complicated but it is really simple. The camera's world-view matrix translates points into a [-1,-1]...[1,1] space on your screen, so we simply used that same matrix to transform the shadow frustum points, then clipped anything beyond -1 and 1 in both directions.

Now with the outline of the clipped shadow frustum in -1...1 space, taking the area of the created shape gives you double the percentage of the screen it covers (represented as 0=0% to 2=100%).

In short, we measured how much each shadow frustum is in view of the camera. Based on this percentage, I would drop the shadow resolution by half, or half again if even less was in view, etc. I believe I put a limit at 64-by-64.

If you play Final Fantasy XV, you can see this in action if you know where to look. If you slowly move so that a shadow from a point light takes less and less screen space you might be able to see the resolution drop.

Now with the shadow-map LOD system, most shadows are drawn at a lower resolution, only going full-size when you get near and are looking directly at the shadowed area. Because this actually affects so many shadows, the savings are significant. If you decide to keep the same limit on shadows as you had before you will find a huge gain in performance. In our case, we continued allowing the shadow atlas to be filled, so we were able to support double or more shadows with the same performance.

Another important optimization is to render static objects to offline shadow maps. A tool generates the shadow maps offline, rendering only static objects (buildings, lamp posts, etc.) into them. At run-time, you create the final shadow map by copying the static shadow map over it and then rendering your dynamic objects (characters, foliage, etc.) into it.

This is a major performance improvement again. We already had this for Final Fantasy XV, but since I added the shadow LOD system I had to make the offline static shadows carry mipmaps. It is important to note that the shadow mipmaps are not a downsampling of mip level 0—you have to re-render the scene into each mipmap, again with some lower limit such as 64-by-64.

All of this together allowed us probably around 30 shadow maps with the ability to dynamically scale with the scene and without too many restrictions on the artists. Shadow maps were sorted by a priority system so that by the time the shadow atlas was filled, the shadows that had to be culled were distant, off-to-the-side, or otherwise unimportant.

L. Spiro

Thanks. Really interesting optimizations.

If you use the same resolutions (ignoring LOD), you can use arrays of texture and cube maps as well while still having a single lighting pass?

🧙

turanszkij

545

September 14, 2017 01:05 PM

14 hours ago, L. Spiro said:

Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture. So a 512-by-512 cube texture became a single 512-by-3,072 texture. Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas.

This is interesting, I've seen other games do this. A few questions come to mind, like wouldn't sampling a cubemap texture be more efficient? I heard that AMD GCN already does the cubemap sampling as regular tex2d samples, but is that the same with NVidia? Is there some open-source implementation of that sampling?

And what do you mean by advantages in caching, clearing, filling and filtering? I don't see any such trivial upsides to this (apart from the dynamic shadow resulotion you mentioned later), but the complexity of the implementation definetly increases.

Wicked Engine

JoeJ

4,183

September 14, 2017 01:36 PM

26 minutes ago, turanszkij said:

This is interesting, I've seen other games do this. A few questions come to mind, like wouldn't sampling a cubemap texture be more efficient? I heard that AMD GCN already does the cubemap sampling as regular tex2d samples, but is that the same with NVidia? Is there some open-source implementation of that sampling?

What if you increase fov for each face by a small amount so you get one 'overlaping' border of texels? You would need to sample always just one shadow map, and i assume artefacts at cube corners / edges would be negligible for shadows?

turanszkij

545

September 14, 2017 02:09 PM

28 minutes ago, JoeJ said:

What if you increase fov for each face by a small amount so you get one 'overlaping' border of texels? You would need to sample always just one shadow map, and i assume artefacts at cube corners / edges would be negligible for shadows?

Good idea, but what about unconnected faces? I imagine there would still be a bunch of branches to determine where we should sample in that case.

Wicked Engine

JoeJ

4,183

September 14, 2017 02:57 PM