Sign in to follow this  

[D3D12] PSO Libraries

This topic is 398 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

https://www.youtube.com/watch?v=dcDDvoauaz0

 

In this video there is a new D3D12 feature called PSO libraries mentioned.  Has this feature been released yet?  Is there documentation for it?

Edited by Infinisearch

Share this post


Link to post
Share on other sites

Quick question:  Do PSO libraries have an effect on the speed of changing PSO's?  For example if you have a PSO library with similar PSO's varying only by blend state does it get treated like DX11 where the driver patches different code onto the end of the pixel shader? This versus monolithic PSO's that is?

Share this post


Link to post
Share on other sites

The answer is no, the PSO Library is only an attempt to save memory and storage from redundant parts ( as it can now account for hundreds of megabytes without that), they have no incidence on individual pso shaders.
 

Are you sure about that?  How does it accomplish saving memory by reusing common parts which means pipelines might not be as monolithic as before.  Driver patching would explain the savings at least partially.

Share this post


Link to post
Share on other sites

So the reality is that pipelines aren't actually monolithic - but they also can't be chunked in a way that's common across the various architectures. Any attempt to parameterize the pipeline (like how D3D11 did) results in overhead, because the mismatch from API to hardware needs to be resolved dynamically.

 

So, we've bundled everything together to give the driver the opportunity to see everything they need upfront, but they still compile into separate pieces that are applied individually.

 

If you use the original serialization APIs, you end up with tons of duplicate data, because multiple pipelines may have common pieces between them, but when you serialize each of them, they have to include it in the serialization. Using a library allows the driver to only write one copy of each piece that gets de-duplicated as more pipelines are added to the library.

 

Loading has similar issues - using the original serialization APIs, there's a temptation for the driver to essentially just memcpy the contents of the serialized pipeline, but if they take that naive approach, you end up with tons of duplication in memory afterwards. So instead, the drivers have to de-duplicate everything on load, which just adds overhead. But using a library, they end up with everything de-duplicated already, so there's no overhead or wasted memory.

Share this post


Link to post
Share on other sites
So the reality is that pipelines aren't actually monolithic - but they also can't be chunked in a way that's common across the various architectures. Any attempt to parameterize the pipeline (like how D3D11 did) results in overhead, because the mismatch from API to hardware needs to be resolved dynamically.   So, we've bundled everything together to give the driver the opportunity to see everything they need upfront, but they still compile into separate pieces that are applied individually.

 

I understood this, as it evidenced by the diagrams in early DX12 presentations but without PSO libraries the chunks are only associated with one mate per interface so there no glue logic present... hence the term monolithic.

 

 

 

If you use the original serialization APIs, you end up with tons of duplicate data, because multiple pipelines may have common pieces between them, but when you serialize each of them, they have to include it in the serialization. Using a library allows the driver to only write one copy of each piece that gets de-duplicated as more pipelines are added to the library.

This is where I think the aforementioned glue logic comes into play.  Lets say you're on an architecture where blending occurs as a program appended to the end of the pixel shader.  Using a PSO library would have a single "pixel shader chunk" with glue logic to the different blend states included in the PSO library.  I'm interested in this because I'm interested in the cost to switch PSO's that are similar.  I was wondering if using libraries reduce the overhead of switching PSO's when within the same library?

Edited by Infinisearch

Share this post


Link to post
Share on other sites

As far as I'm aware, all current D3D12 driver implementations do de-duplicate based on state combinations across the entire device, they don't need to be associated with a library for that to happen. The library association is all about minimizing the cost of saving the compilation result for a new device.

 

So with this de-duplication, switching from one PSO to another only requires changing the pieces that changed. I know that, for example, on Xbox, they've put this diffing logic directly into the command processor on the GPU so you even get benefits from one command list to the next. I think all current PC drivers do this on the CPU though.

Share this post


Link to post
Share on other sites

Sorry for the late reply.  Do you know how this de-duplication takes place...? to me it seems it would add to shader compilation time.  Also is input layout considered a state combination?  Or is that a part of the vertex shader chunk?

 

Finally in the above video it also mentions programmable blending... do you know when this will be released?

Share this post


Link to post
Share on other sites

There is no more "input layout" hardware, at least on AMD. The shader manually fetch the data from the memory. Because of that, on DX11, they used to rely on what was called "Fetch Shader", a small piece of code they jump to at the beginning of the VS to glue it and not recompile the shader for every input layout. Now that the DX12 PSO contains the layout, they can apply extra optimization by inlining the fetches, doing so can help a little on read latency and register pressure. ( using the cached blob feature on AMD, you can retrieve back the ISA to see what the final microcode is )

 

The memory de-duplication concerns parts that are not related to the shader code itself, it should not add a significant amount of time building the library. Funny enough, on Xbox One, the concept is inverted, you can derive a PSO from an other to share the common parts the PSOLibrary de-duplicate on PC

Share this post


Link to post
Share on other sites

This topic is 398 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this