Sign in to follow this  

Slow shader compile time when using large arrays.

This topic is 1112 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am getting awfully slow shader compilation times when using large arrays either in cbuffer or tbuffer formats. Simple test case where vertex is transformed by matrix from array using instanceId is enough to trigger this bug.

Compile times with different array size are:

n = 128     -> 0.008s

n = 1024   -> 0.621s

n =  65536 -> 62.34s

tbuffer TransformsTextureBuffer  : register(t0) { float4x4 u_transforms[1024];  } 

struct VS_INPUT
{
	float3	position	: SV_Position;
	uint instanceID		: SV_InstanceID;
};

struct PS_INPUT
{
	float4	position	: SV_Position;
};


PS_INPUT main (VS_INPUT input)
{
	PS_INPUT output = (PS_INPUT)0;
	output.position = mul(u_transforms[input.instanceID], float4(input.position, 1.0));
	return output;
}

Can any one repeat this problem? Is this know behavior? Is there know solutions?

 

Shader version is vs_5_0.

Edited by kalle_h

Share this post


Link to post
Share on other sites
Yeah, whenever there's a fixed size array something like this happens. I recommend structured buffers anyway. You can also omit the size then: 
StructuredBuffer<matrix> TransformsTextureBuffer;

Share this post


Link to post
Share on other sites

Yeah, whenever there's a fixed size array something like this happens. I recommend structured buffers anyway. You can also omit the size then: 

StructuredBuffer<matrix> TransformsTextureBuffer;

 

Why this happens and is there anything else that I can do? I would like to keep full support for older Direct3D 10 devices.

Share this post


Link to post
Share on other sites

I tried StructuredBuffers and they fix all compile time issues. Going to test Buffers now. I am just stunned how compile time can increase almost linearly just by changing array size. It just does not make sense for me. Thanks for answer.

Share this post


Link to post
Share on other sites

Where I can find info in usage of those non structured Buffer's. Name is making it pretty hard to search. Are they just like tbuffer/cbuffer but just don't allow user defined structs?

Share this post


Link to post
Share on other sites
It's "just" how you setup the shader resource view for that buffer, where you can define the type (the DXGI_FORMAT). A [tt]Buffer[/tt] will be a tbuffer (it will be assigned to a t-register). If you got any troubles, watch the D3D debug output.

Share this post


Link to post
Share on other sites

Buffer<float4> is a generic buffer object - only loads and [] access is permitted. Here is a code for creating one. Each element is defines as float4 - you can decide how to store the data in it. 

     HRESULT hResult;  
   
     D3D11_BUFFER_DESC descBuffer;  
     descBuffer.ByteWidth = BufferSize;  
     descBuffer.Usage = D3D11_USAGE_DYNAMIC;  
     descBuffer.BindFlags = D3D11_BIND_SHADER_RESOURCE;  
     descBuffer.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;  
     descBuffer.MiscFlags = 0;  
     hResult = DXUTGetD3D11Device()->CreateBuffer( &descBuffer, nullptr, &pBuffer );  
     assert( SUCCEEDED( hResult ) );  
   
     D3D11_SHADER_RESOURCE_VIEW_DESC descBufferSRV;  
     descBufferSRV.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;  
     descBufferSRV.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;  
     descBufferSRV.Buffer.ElementOffset = 0;  
     descBufferSRV.Buffer.ElementWidth = BufferSize / 16;
     hResult = DXUTGetD3D11Device()->CreateShaderResourceView( pBuffer, &descBufferSRV, &pBufferSRV );  

... bind it as any texture

DXUTGetD3D11DeviceContext()->PSSetShaderResources( Slot, 1, &pBufferSRV ); 

... Define it in the HLSL code as a buffer using a texture register.

Buffer<float4> InstanceData : register(t0);

Cheers!

Edited by kauna

Share this post


Link to post
Share on other sites

Edit: More importantly: Why is compilation time a problem ? You could precompile the shaders and load the binaries. No need to compile them everytime you start your app.

 

 

Shader compilation times matter a lot for maintaining and improving shaders. Actual shader files have a lot of permutations and lot higher complexity. Actual recompile times for dirty shaders was counted on minutes(even when compiling with multiple threads). We also allow modding of every assets/shaders when game is running(hot loaded) so it would be awfull experience not just me but everyone else that would want to edit game shaders files. I went with your Buffer suggestion and compile times are manageable once again.

Share this post


Link to post
Share on other sites
I'm curious, too.

kalle, you might reconsider. You say DX10 hardware, but you seem use the DX11 API. This is peculiar: structured buffers actually work with feature level 10 (and SM 4 shaders). I just did a minimal test with both hardware and reference device.

Share this post


Link to post
Share on other sites

I'm curious, too.

kalle, you might reconsider. You say DX10 hardware, but you seem use the DX11 API. This is peculiar: structured buffers actually work with feature level 10 (and SM 4 shaders). I just did a minimal test with both hardware and reference device.

Thanks. Thats interesting behaviour. Documentation is bit scarce and just say. 

 

http://msdn.microsoft.com/en-us/library/windows/desktop/ff471514%28v=vs.85%29.aspx

Shader Model 4 (Available for compute and pixel shaders in Direct3D 11 on some Direct3D 10 devices.)

 

We are already in early access so I need to be really careful not to cut anyone out that already has bought the game. Structured buffers would be superior of ease of use. I guess the performance should be similar with either tbuffer, Buffer or Structured Buffer?

Edited by kalle_h

Share this post


Link to post
Share on other sites
If the reference device runs fine, you're probably good. But yeah, if you're game is out, you don't wish to break anything.

Can't tell you about performance implications. HLSL assembly looks similar if not identical, though. But with GPUs I wouldn't be surprised of anything.

And for a bit of convenience you can always abstract stuff with functions ([tt]GetTransform(uint index)[/tt]), macros or defines in HLSL and use the lowest denominator for compatibility. Example: I found a similar compiler behaviour when playing with skinning. Here I switch between constant buffers and structured with a define.
 
#ifdef STRUCTUREDBUFFER
StructuredBuffer<Bone> Bones: register(t4);
#endif

cbuffer SkinningParameters : register(b4)
{
#ifndef STRUCTUREDBUFFER
    Bone Bones[MAX_BONES];
#endif
};
No change in the vertex shader needed, it just accesses [tt]Bones[index][/tt]. Buffer creation and binding is different, of course.

With the cast functions ([tt]asint(), asfloat()[/tt]) you can also roll your own decoding.

Share this post


Link to post
Share on other sites

This topic is 1112 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this