Jump to content
  • Advertisement
Sign in to follow this  
Funkymunky

Cascaded Shadows maps, texture atlas or texture array?

This topic is 2161 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In researching cascaded shadow maps, I see a lot of advice about putting them into an atlas.  If they're passing in 4 MVPs anyway, I assume that they're using a geometry shader to emit 4 copies of the vertices.  So why not just have the geometry shader specify the layer of a texture array to render the data into?  What is the upside of using an Atlas?

Share this post


Link to post
Share on other sites
Advertisement

Huh, that's surprising.  I guess I'll implement both and profile it, but googling around about geometry shaders it sounds like that's the presiding opinion, even for current cards.  Thanks!

Share this post


Link to post
Share on other sites

The GS is in general a slow path for the GPU. It doesn't map very well to the hardware, and any kind of amplification scenario results in lots of overhead and traffic to off-chip memory. It can definitely save you CPU overhead since you can potentially reduce draw calls by quite a bit, but very likely this will be at the expense of GPU performance.

Share this post


Link to post
Share on other sites

The geometry shader can be used to define the destination texture array slice of a triangle, so it has its use.

 

So you can use geometry instancing when rendering the geometry and for different instances you can specify the destination slice. This way you'll save multiple draw calls at least per object. Otherwise the geometry shader only passes through the triangle data so no geometry amplification is done.

 

Cheers!

Share this post


Link to post
Share on other sites

There's not really many upsides to using an atlas over an array, it was manily used on hardware that predates texture arrays. Using an array simplifies things a lot and is just as fast.

Most literature about cascaded shadow maps predates availability of texture arrays; that's why there's a lot of "advice about putting them into an atlas" and very little about using texture arrays.

However, the choice between two ways to reduce texture switching is only a low level detail, which affects performance (maybe) but not shadow quality.

Share this post


Link to post
Share on other sites

I'v already done some experiments in my own engine and here are results (everything tested on complex scene (with large amount of small/medium scale vegetation objects (most of them instanced) with about 3 milion vertices casting shadows, 30% of them are skined, nvidia 680m, i7):

 

1] draw everything once, use GS to replicate vertices and use 2048x2048 texture with 4 1024x1024 quaters - SLOOOW - you need to use 2 custom clip planes to clip to the quater of atlas - GS is main botleneck (performance of whole frame around FPS = 21.1)

 

2] draw everything once, use GS to replicate vertices and use 1024x1024x4 texture array - SLOOW - but better than previous since no clipping planes are needed - GS is main bottleneck (FPS = 22.4)

 

3] draw everything once, into 2048x2048 texture with 4 1024x1024 quaters - this time for every drawcall multiply instances count by 4, and in VERTEX shader use (InstanceIndex&0x3) to output into specific quater of atlas (again 2 custom clip planes used) - FAST, FPS = 42.7 !!! (twice as fast as with GS path) - this time the bottleneck is in vertex shader for all those skined vertexes.

 

4] use texture array, but for each cascade submit their own set of draw calls, THERE _IS_ oprortunity to clip them independently, so the win is total number of vertices processed by VS (for points 1, 2 the total was 3M, for 3 it was 12M!, for 4 it was 6M) but the loos is total number of batches (for 1, 2, 3 it was 972, for 4 it was 1944)

FPS = 42.1 - if all is submited to base context, 44.1 if 4 deferred contexts are used and each is created on different sheduler task, then all of them are submited at once  into base context)

 

for 3 there is probably chance to outperform 4 if some neat way of clipping is introduced, but for now i have no time for this and i'm stick with it as-is since i need to get with batches as low as possible since other parts of engine demands them.

Share this post


Link to post
Share on other sites

I'v already done some experiments in my own engine and here are results (everything tested on complex scene (with large amount of small/medium scale vegetation objects (most of them instanced) with about 3 milion vertices casting shadows, 30% of them are skined, nvidia 680m, i7):

 

1] draw everything once, use GS to replicate vertices and use 2048x2048 texture with 4 1024x1024 quaters - SLOOOW - you need to use 2 custom clip planes to clip to the quater of atlas - GS is main botleneck (performance of whole frame around FPS = 21.1)

 

2] draw everything once, use GS to replicate vertices and use 1024x1024x4 texture array - SLOOW - but better than previous since no clipping planes are needed - GS is main bottleneck (FPS = 22.4)

 

3] draw everything once, into 2048x2048 texture with 4 1024x1024 quaters - this time for every drawcall multiply instances count by 4, and in VERTEX shader use (InstanceIndex&0x3) to output into specific quater of atlas (again 2 custom clip planes used) - FAST, FPS = 42.7 !!! (twice as fast as with GS path) - this time the bottleneck is in vertex shader for all those skined vertexes.

 

4] use texture array, but for each cascade submit their own set of draw calls, THERE _IS_ oprortunity to clip them independently, so the win is total number of vertices processed by VS (for points 1, 2 the total was 3M, for 3 it was 12M!, for 4 it was 6M) but the loos is total number of batches (for 1, 2, 3 it was 972, for 4 it was 1944)

FPS = 42.1 - if all is submited to base context, 44.1 if 4 deferred contexts are used and each is created on different sheduler task, then all of them are submited at once  into base context)

 

for 3 there is probably chance to outperform 4 if some neat way of clipping is introduced, but for now i have no time for this and i'm stick with it as-is since i need to get with batches as low as possible since other parts of engine demands them.

 

blink.png That seems quite low in terms of performance, but hey context and unoptimized. Thanks for sharing, a doubling in performance seems pretty clear.

Share this post


Link to post
Share on other sites

Those numbers seem similar to what I've experienced in the past. Like I said, geometry shaders are not fast.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!