• Advertisement

zolwik

Member
  • Content count

    9
  • Joined

  • Last visited

Community Reputation

135 Neutral

About zolwik

  • Rank
    Newbie
  1. I had a problem reading data with compute shader from const buffer - shader read garbage. What seemed confusing - graphic diagnostics from VS 2013 showed correct data in the buffer. After hours of pain I checked shader disassemble and found out that const buffer I register to slot 1 IS SOMEHOW MOVED TO SLOT 0! The const buffer from slot 0 got cut out, just because I don't use it in that particular function. When I switched const buffer registers - it started to work as I expected.   I suppose compiler should not change const buffer slots?     Shader cbuffers: cbuffer Region : register( c0 ) { RegionData _region; }; cbuffer Phase : register( c1 ) <- SLOT 1 !!! { PhaseData _phase; }; Disassemble of bindings: Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384 // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // Pingpong UAV uint buf 3 1 // Phase cbuffer NA NA 0 1 <- SLOT 0!!! <...> cs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[1], immediateIndexed <- cb0!!! dcl_uav_typed_buffer (uint,uint,uint,uint) u3 dcl_temps 1 dcl_thread_group 1, 1, 1 <...>
  2. @MJP: I tried some CopySubresourceRegion for dummy buffers between dispatches and driver is too smart for that. Inserting artificial synchronization points might work, but I won't do it until every other option fails. It seems nightmare to keep code understandable after that.   @ATEFred: Is it only when gpu must sync between them, or it always happens? It would be strange if gpu stalled on each dispatch with idle shading units.
  3. I tried naively enclosing Dispatches with timestamp queries (ID3D11Query) and it failed to give reasonable results. First Dispatch seems to take long time, few next are below microsecond. I suppose gpu ends first dispatch after pipeline is ready for executing ComputeShaders, and following dispatches just pop in when there is place for new threads. Any synchronizations between them seem to be handled after that with no impact on dispatch timestamps. Unfortunately Nvidia Nsight works really slow on my pcm so is there any way of measuring Compute Shader execution time using ID3D11Query or similar approach? I am afraid there is no simple solution with current API.
  4. @up Exactly. Right now I just do what Jason wrote. I wondered if there is option to avoid code redundancy and duplicating same code for each resource and index. I would like to pass Nodes[i0].nodes[i1].data[1] as argument, not i0 and i1, same as in InterlockedCompareExchange.   void InterlockedAverage(uint i0, uint i1, float4 val) { val.rgb *= 255.; uint nval = Float4ToUint(val); uint prev = 0; uint current; InterlockedCompareExchange(Nodes[i0].nodes[i1].data[1], prev, nval, current); [allow_uav_condition] while(prev != current) { prev = current; float4 rval = UintToFloat4(current); rval.xyz *= rval.w; float4 curf = rval + val; curf.xyz /= curf.w; nval = Float4ToUint(curf); InterlockedCompareExchange(Nodes[i0].nodes[i1].data[1], prev, nval, current); } }
  5. I can't find this in doc. Is there any way to declare helper function argument as resource (as in interlocked intrinsic functions)? I want to make my own general atomic function working on resource and declaring argument as uint returns compile error.   Index is available at compile time, so I hope it is just some fancy syntax problem.
  6. Well I do check errors, just stripped that here for clearer code. I use same texture and sampler in PS and I set them for both PS and VS. PS draws texture correctly, but VS just gives 0. I tried sampling at mipmap level 1, and scaling coordinates by texture size for Load. I tried also setting MipLevels for 0, but why would it help? Setting 0 just forces to generate whole chain of mipmaps. I have no idea what can cause it. Maybe there are some options that should, or shouldn't be set somewhere? But I don't think I made anything extraordinary. [SOLVED] Uff, why didn't I think earlier about drawing wireframes? Double bug in uv and positions caused texture to draw fine in PS, but sampled from bad coordinates in VS .
  7. I have trouble with sampling texture in vertex shader and I couldn't find much info on the web. Loading texture: [CODE] D3DX11_IMAGE_LOAD_INFO info; info.Format = DXGI_FORMAT_R16_UNORM ; // tried also DXGI_FORMAT_R32_FLOAT info.MipLevels = 1; D3DX11CreateShaderResourceViewFromFile(device, L"heightmap.png", &info, NULL,&pHeightMapSRV,NULL); [/CODE] I use D3D11_FILTER_MIN_MAG_MIP_POINT sampler. I declare texture in VS as Texture2D<float>. It makes no difference whether I use texture.SampleLevel( sampler, In.UV, 0 ), or texture.Load( float3( In.UV.x,In.UV.y,0 )) in vertex shader. Sampling just returns 0. UVs and texture seems ok, as pixel shader samples without problems. Any ideas what could be done wrong?
  8. Xna Math Performance

    [quote name='Quat' timestamp='1316190532' post='4862502'] Does it help if you replace those for your subtractions and additions? [/quote] It helps. But still: DX Math : 1.30931 Xna Math : 1.64447 I changed Normalize to NormalizeEst. Now it is: DX Math : 1.30711 Xna Math : 1.36559
  9. I've done a little research to check performance of Xna math library. For test I changed a little code for counting normals: DX Math: [code] for(int i=0;i<primitives*3;i++) { D3DXVECTOR3 nor; D3DXVECTOR3 v1 = pos[rand()%primitives]-pos[rand()%primitives]; D3DXVECTOR3 v2 = pos[rand()%primitives]-pos[rand()%primitives]; D3DXVec3Cross(&nor,&v1,&v2); D3DXVec3Normalize(&nor,&nor); pos[rand()%primitives] += nor; pos[rand()%primitives] +=nor; pos[rand()%primitives] += nor; } for(int i=0;i<primitives;i++) { D3DXVec3Normalize(&pos[i],&pos[i]); } [/code] Xna Math: [code] for(int i=0;i<primitives*3;i++) { XMVECTOR nor; nor = XMVector3Cross(xmpos[rand()%primitives]-xmpos[rand()%primitives],xmpos[rand()%primitives]-xmpos[rand()%primitives]); nor = XMVector3Normalize(nor); xmpos[rand()%primitives] += nor; xmpos[rand()%primitives] += nor; xmpos[rand()%primitives] += nor; } for(int i=0;i<primitives;i++) { xmpos[i] =XMVector3Normalize(xmpos[i]); } [/code] Well, I run it for 10^6 primitives and my time results for this code parts: D3DX Math : 1.31335 XNA Math : 2.04672 After reading a bit I found out that XMVECTOR should be 16 byte aligned on heap, so I changed new to (XMVECTOR*)_aligned_malloc(sizeof(XMVECTOR)*primitives,16); New results: D3DX Math : 1.32109 XNA Math : 2.05517 Visual studio instructions set: Streaming SIMD Extensions 2 (/arch:SSE2) (/arch:SSE2).. Now my questions is: what have I done wrong? I did also test with storing data as XMFLOAT3 with loading it for computations, than storing, and it was 3 times slower than simple and convenient DX math.
  • Advertisement