Sign in to follow this  
Infinisearch

Root Signatures Version 1.1

Recommended Posts

I recently came upon root signatures version 1.1 and have been reading up on them.  I was wondering exactly why such mechanics result in a performance increase?  And also how much of a performance increase are we looking at?

 

https://msdn.microsoft.com/en-us/library/windows/desktop/mt709473(v=vs.85).aspx

Share this post


Link to post
Share on other sites

This is more or less resume in that quote :

 

 

One optimization is that many drivers can produce more efficient memory accesses by shaders if they know the promises an application can make about the static-ness of descriptors and data. For example, drivers could remove a level of indirection for accessing a descriptor in a heap by converting it into a root descriptor if the particular hardware is not sensitive to root argument size.

 

When you record commands, if your root signature show that one descriptor is static, it may be inline in the command buffer in a way that the actual shader will not read the descriptor anymore but access a copy store differently, possibly removing that extra indirection. For example, on the GCN hardware, it could be store in some user register directly available to the shader instead of loading them from the memory.

 

Unless you dig deep in optimization and are sure that fetching that descriptor is your bottleneck in some specific cases, chances are that you will just make your life harder.

Share this post


Link to post
Share on other sites

This is more or less resume in that quote : Quote     One optimization is that many drivers can produce more efficient memory accesses by shaders if they know the promises an application can make about the static-ness of descriptors and data. For example, drivers could remove a level of indirection for accessing a descriptor in a heap by converting it into a root descriptor if the particular hardware is not sensitive to root argument size.   When you record commands, if your root signature show that one descriptor is static, it may be inline in the command buffer in a way that the actual shader will not read the descriptor anymore but access a copy store differently, possibly removing that extra indirection. For example, on the GCN hardware, it could be store in some user register directly available to the shader instead of loading them from the memory.   Unless you dig deep in optimization and are sure that fetching that descriptor is your bottleneck in some specific cases, chances are that you will just make your life harder.

In your quote that optimization is only possible if the "hardware is not sensitive to root argument size", the only hardware I know like that is intel.  So I was hoping someone can shed light on other optimizations possible.  As far as what you said you're saying that a register could be initialized directly from a command buffer?  If so does that mean the CPU has a copy of descriptor heaps? (since command buffers are filled out by the CPU) 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this