Jump to content
  • Advertisement
gamer9xxx

float4 read/write as 4 instructions instead of 1?

Recommended Posts

Hi guys,

I'm writing a simple Compute Shader in DirectX11, shader model 5, trying to store float4 color into a groupshared memory per thread, then read it back.
From my understanding of MSDN, the instruction store_structured (the same applies for ld_structured), can write 4 x 32bit components at once.

Quote

"This instruction performs 1-4 component *32bit components written from src0 to dst0 at the address in dstAddress and dstByteOffset."

Therefore I would expect one float4 write, translates into one store_structured instruction.
However in my simple shader it translates into 4 store_structured instructions!


Code:

groupshared float4 ColorQuad[4][4][64];

...

ColorQuad[x][y][shared_index] = float4(0.1f, 0.2f, 0.3f, 0.4f);

...

float4 color = ColorQuad[x][y][shared_index];

This code is compiled into this:
 

dcl_tgsm_structured g0, 4096, 4

...

mov r1.x, r0.w
imul null, r1.y, r0.x, l(16)
imad r1.y, r0.z, l(1024), r1.y
store_structured g0.x, r1.x, r1.y, l(0.100000)  // ColorQuad<0>
iadd r1.z, r1.y, l(4)
store_structured g0.x, r1.x, r1.z, l(0.200000)  // ColorQuad<0>
iadd r1.z, r1.y, l(8)
store_structured g0.x, r1.x, r1.z, l(0.300000)  // ColorQuad<0>
iadd r1.y, r1.y, l(12)
store_structured g0.x, r1.x, r1.y, l(0.400000)  // ColorQuad<0>

...

imad r1.y, r0.z, l(1024), r1.y
ld_structured r2.x, r1.x, r1.y, g0.xxxx  // ColorQuad<0:Inf>
iadd r1.z, r1.y, l(4)
ld_structured r2.y, r1.x, r1.z, g0.xxxx  // ColorQuad<1:Inf>
iadd r1.z, r1.y, l(8)
ld_structured r2.z, r1.x, r1.z, g0.xxxx  // ColorQuad<2:Inf>
iadd r1.y, r1.y, l(12)
ld_structured r2.w, r1.x, r1.y, g0.xxxx  // ColorQuad<3:Inf>

Now I'm very confused, why this is happening.
Am I just understanding it wrong, the MSDN actually says it can write 1x32bit / 4x8bit of data?
Or bank conflict compiler optimization?

Thanks for any explanation!

Share this post


Link to post
Share on other sites
Advertisement

The DXBC bytecode does not matter much compared to the final uCode. Plus the GPUs are scalar these days, so the simd instruction are counter productive to the driver anyway.

 

Could it have do a write4 ? maybe ! But is this important without seeing your GPU uCode ? No.

 

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!