Jump to content
  • Advertisement
Sign in to follow this  
Infinisearch

DX12 Cost of Switching Shaders

This topic is 989 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

1.  Why is switching shaders expensive?

2.  Is it any faster in DX12?  Is there a different logic for 12 vs 11?

 

also for successive command list submissions in D3D12 with the same PSO is there "redundant PSO filtering"? or does it cost the same for same and different PSO's?

Edited by Infinisearch

Share this post


Link to post
Share on other sites
Advertisement
I believe "expensive" is all relative, it all depends on the other choices you have to achieve the aimed goal.

Measure and profile :)

Share this post


Link to post
Share on other sites

Thank you for your answers.

 


On older GPU's I used to use the rule of thumb that a shader change (or other major pipeline state change) was OK as long as every batch covered at least 400 pixels.

Is there a triangle figure as well? 

 


I believe "expensive" is all relative, it all depends on the other choices you have to achieve the aimed goal.

I am obsessed with front to back rendering but even so switching shaders to accomplish such a goal is most likely to expensive.  So I'll likely still sort by shader then depth.

Share this post


Link to post
Share on other sites

On the CPU side, the "root signature" is changed, which means that (on pre-D3D12 APIs), all the resource bindings must be re-sent to the GPU. The driver/runtime also might have to resubmit a bunch of pipeline state, and even validate that the PS / VS are compatible, etc (and possibly patch them if they mis-match, or patch the VS if it mis-matches with the IA config).... The driver might also have to do things like patch the PS if it doesn't match the current render-target format... sad.png
D3D12 helps here because you're aware of what's going on now -- you manage your own resource bindings and root signatures, instead of a general purpose runtime trying to guess the best way to manage them for you. It also doesn't do any validation when you bind a resource, so rebinding resources upon root-signature changes is cheaper.
 
On the GPU side, the front-end has to do a bunch of work to consume all those state/resource changes and get the GPU's cores ready to run with the new configuration. This work is generally pipelined, so the cost is free as long as there's no pipeline stalls... However, if the GPU's cores don't have enough work per draw-call, then they might finish their work before the front-end has configured the next pipeline state, forcing them to sit idle while that happens -- i.e. a pipeline bubble is formed.
On older GPU's I used to use the rule of thumb that a shader change (or other major pipeline state change) was OK as long as every batch covered at least 400 pixels. On modern GPU's I believe you can get away with more frequent switches than that. Some GPUs can even be preparing more than one pipeline config at a time, so you need to have ~8 "small batches" in a row in order to actually cause a pipeline bubble.

I wonder... if you were to bundle all those calls into a d3d11 command list. Would (can?) the driver perform most (all?) of those transformations before hand (when the command list is created), and with a command list get some of the optimizations that d3d12 gives with d3d11? Of course you'd need to either created the command list on another thread, or reuse the command list to get any advantage.

Share this post


Link to post
Share on other sites

I wonder... if you were to bundle all those calls into a d3d11 command list. Would (can?) the driver perform most (all?) of those transformations before hand (when the command list is created), and with a command list get some of the optimizations that d3d12 gives with d3d11? Of course you'd need to either created the command list on another thread, or reuse the command list to get any advantage.

I don't think so. AFAIK, D3D11's cmd list recording is all done in the user-mode library, and doesn't actualy call into the driver until you submit the list.

Share this post


Link to post
Share on other sites

I don't think so. AFAIK, D3D11's cmd list recording is all done in the user-mode library, and doesn't actualy call into the driver until you submit the list.


It does state in the documentation "Pre-record a command list before you need to render it (for example, while a level is loading) and efficiently play it back later in your scene. This optimization works well when you need to render something often.". I've also seen it stated (though I don't remember exactly where) that the driver can perform some optimizations on the command list, and that the user-mode library is only used when the driver doesn't support it.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!