Jump to content
  • Advertisement
Sign in to follow this  
vlj

Slow shadowmap cascades generation

This topic is 1504 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

 

In my engine I use a rather classic cascaded shadowmap algorithm with 4 cascade, but the cascades generation can take up to 5ms per frame on some map on a R9 290 gpu which is huge. There is no caching at all for the moment.

The shader code is rather simple, just a matrix vector product in the vertex shader and a simple FragColor = vec4(1.) on the pixel shader side with an optionnal discard if a texture is provided.

The map render up to 500 000 triangles for the 4 cascade generation which is a big number but I think the gpu should be able to handle such number faster.

I use the AZDO approach for the shadow to reduce the cpu overhead (I have a big object count, several thousands per map).

 

Using GPU Perf Studio 2 it looks like the bottleneck is tied to the "PAStalledOnRasterizer" metric (PA stand for Primitive Assembly). I think (but I'm not sure) it's the internal name for "ROP bound". However if I reduce the resolution of my shadow map (which should also reduce the number of rop used per triangle I think) the cost of the cascades generation remains the same.

 

It's the first time I write a shadowmap algorithm and I have no idea where to start to improve the performance of the algorithm.

Do you see something I'm doing wrong ?

 

Regards,

Vincent

Share this post


Link to post
Share on other sites
Advertisement

There really is not much benefits to be gained from using the full resolution mesh for the shadow generation if according to your profile result that you are ROP bound. If you are using the same mesh for shadow map generation and normal rendering, have you tried using simpler approximation for the shadow map? If so are you seeing the same perf hit ?

Share this post


Link to post
Share on other sites


Using GPU Perf Studio 2 it looks like the bottleneck is tied to the "PAStalledOnRasterizer" metric (PA stand for Primitive Assembly). I think (but I'm not sure) it's the internal name for "ROP bound". However if I reduce the resolution of my shadow map (which should also reduce the number of rop used per triangle I think) the cost of the cascades generation remains the same.

 

It's telling you that you're bound by triangle throughput, not the ROPS. Basically you're generating triangles quicker than the rasterizer can process them and rasterize them into pixels. It's fairly common for this to occur with shadow map generation. 

Share this post


Link to post
Share on other sites

I'm gonna make an educated guess based on my past experiences and what's written here and suggest this: reduce your shadow map size. And don't underestimate the amount of pixels you're asking your GPU to pull through. For example, 4 cascades, 1024x1024 each - that's 4.2 million pixels. For reference, 1920x1080 is just a bit over 2 million.

 

Edit: I didn't notice that you had already tried it. Seems less likely (the triangles would have to be almost pixel-sized) but I guess the number of triangles may be too big (I remember rendering a lot of triangles into a cascaded shadow map myself and reducing shadow map size made the single biggest speed improvement, possibly because I was rendering trees where the buffer is redrawn many times).

Share this post


Link to post
Share on other sites

You don't mention how your shader is setup. So, if you do something different, for rendering shadow maps, the shader should be written to take only vertex coordinates, and you should send the shader only vertex coordinates from the CPU. If you send normals, binormals, texcoords, etc., they're not used, can increase the required transfer by 2 or three times, and just clog the throughput.

Edited by Buckeye

Share this post


Link to post
Share on other sites

For what it worths, I use polygon offset but turning it off didn't improve the performance at all.

I also removed texcoord (and thus discard) and the pass takes on average 0.5 ms less to be executed. But rendering foliage/grass shadows without texturing/discard doesnt look great...

 

Is there any way to run a compute shader concurrently ? I think modern gpu can run compute shader and graphic load at the same time in order to increase hardware utilisation, and on the other hand it looks like there is no workaround when being rasterizer limited besides reducing triangles count and varying (currently I only send texcoords) so it may be something to investigate.

Share this post


Link to post
Share on other sites


But rendering foliage/grass shadows without texturing/discard doesnt look great.

 

Are you using transparency? If so, that makes sense.

 

Do you sort your objects by depth (front-to-back) prior to sending them to the shadow shader? That might provide an early-out for deeper objects, particularly assuming that, with foliage, there is a lot of overlap.

Share this post


Link to post
Share on other sites

Using a 2048x2048 shadow map per cascade the shadow pass takes 12 ms to render, ie twice as much time (although there are 4 time as much pixels)

 

I'm trying to see if sorting help, although this is a quite "flat" map.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!