Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 15 Sep 2011
Offline Last Active Jan 05 2014 03:30 PM

Topics I've Started

HLSL compiler bug?

01 December 2013 - 02:10 PM

I had a problem reading data with compute shader from const buffer - shader read garbage. What seemed confusing - graphic diagnostics from VS 2013 showed correct data in the buffer. After hours of pain I checked shader disassemble and found out that const buffer I register to slot 1 IS SOMEHOW MOVED TO SLOT 0! The const buffer from slot 0 got cut out, just because I don't use it in that particular function. When I switched const buffer registers - it started to work as I expected.


I suppose compiler should not change const buffer slots?



Shader cbuffers:

cbuffer Region : register( c0 )
	RegionData _region;

cbuffer Phase : register( c1 )  <- SLOT 1 !!!
	PhaseData _phase;

Disassemble of bindings:

Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384

// Resource Bindings:
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// Pingpong                              UAV    uint         buf    3        1
// Phase                             cbuffer      NA          NA    0        1  <- SLOT 0!!!


      dcl_globalFlags refactoringAllowed
      dcl_constantbuffer cb0[1], immediateIndexed    <- cb0!!!
      dcl_uav_typed_buffer (uint,uint,uint,uint) u3
      dcl_temps 1
      dcl_thread_group 1, 1, 1


Compute Shader execution time

01 August 2013 - 05:10 AM

I tried naively enclosing Dispatches with timestamp queries (ID3D11Query) and it failed to give reasonable results. First Dispatch seems to take long time, few next are below microsecond. I suppose gpu ends first dispatch after pipeline is ready for executing ComputeShaders, and following dispatches just pop in when there is place for new threads. Any synchronizations between them seem to be handled after that with no impact on dispatch timestamps. Unfortunately Nvidia Nsight works really slow on my pcm so is there any way of measuring Compute Shader execution time using ID3D11Query or similar approach? I am afraid there is no simple solution with current API.

HLSL: Resource as function parameter

02 April 2013 - 03:59 AM

I can't find this in doc. Is there any way to declare helper function argument as resource (as in interlocked intrinsic functions)? I want to make my own general atomic function working on resource and declaring argument as uint returns compile error.


error X3669: Resources being indexed cannot come from conditional expressions, they must come from literal expressions.

Index is available at compile time, so I hope it is just some fancy syntax problem.

Sampling texture in vertex shader

24 March 2012 - 03:02 PM

I have trouble with sampling texture in vertex shader and I couldn't find much info on the web.

Loading texture:

info.Format = DXGI_FORMAT_R16_UNORM ; // tried also DXGI_FORMAT_R32_FLOAT
info.MipLevels = 1;
D3DX11CreateShaderResourceViewFromFile(device, L"heightmap.png", &info, NULL,&pHeightMapSRV,NULL);

I use D3D11_FILTER_MIN_MAG_MIP_POINT sampler. I declare texture in VS as Texture2D<float>. It makes no difference whether I use
texture.SampleLevel( sampler, In.UV, 0 ), or texture.Load( float3( In.UV.x,In.UV.y,0 )) in vertex shader. Sampling just returns 0.
UVs and texture seems ok, as pixel shader samples without problems.

Any ideas what could be done wrong?

Xna Math Performance

15 September 2011 - 07:58 AM

I've done a little research to check performance of Xna math library. For test I changed a little code for counting normals:

DX Math:

	for(int i=0;i<primitives*3;i++)
		D3DXVECTOR3 nor;
		D3DXVECTOR3 v1 = pos[rand()%primitives]-pos[rand()%primitives];
		D3DXVECTOR3 v2 = pos[rand()%primitives]-pos[rand()%primitives];

		pos[rand()%primitives] += nor;
		pos[rand()%primitives] +=nor;
		pos[rand()%primitives] += nor;
	for(int i=0;i<primitives;i++)

Xna Math:

	for(int i=0;i<primitives*3;i++)
		nor = XMVector3Cross(xmpos[rand()%primitives]-xmpos[rand()%primitives],xmpos[rand()%primitives]-xmpos[rand()%primitives]);
		nor = XMVector3Normalize(nor);

		xmpos[rand()%primitives] += nor;
		xmpos[rand()%primitives] += nor;
		xmpos[rand()%primitives] += nor;
	for(int i=0;i<primitives;i++)
		xmpos[i] =XMVector3Normalize(xmpos[i]);

Well, I run it for 10^6 primitives and my time results for this code parts:
D3DX Math : 1.31335
XNA Math : 2.04672

After reading a bit I found out that XMVECTOR should be 16 byte aligned on heap, so I changed new to (XMVECTOR*)_aligned_malloc(sizeof(XMVECTOR)*primitives,16);

New results:
D3DX Math : 1.32109
XNA Math : 2.05517

Visual studio instructions set: Streaming SIMD Extensions 2 (/arch:SSE2) (/arch:SSE2)..

Now my questions is: what have I done wrong? I did also test with storing data as XMFLOAT3 with loading it for computations, than storing, and it was 3 times slower than simple and convenient DX math.