zeo4

Members
  • Content count

    26
  • Joined

  • Last visited

Community Reputation

291 Neutral

About zeo4

  • Rank
    Member

Personal Information

  • Interests
    Programming
  1. After researching a bit I guess there are none.
  2. Hi, Do you know of any DirectCompute profiling tool, giving a detailed breakdown of GPU compute operations, like warps / wavefronts timeline execution / statistics, memory statistics, etc.. NSight does it for CUDA, but not for DirectCompute. I need to find a bottleneck in my compute shader, as it executes way too long. Thanks!
  3.   I really don't know where in the world you have come across this tip but it has worked beautiful. Such a shame I'm unable to give more points for your tip. Sharing the details for the ones having same / similar issues: The problem was indeed with window client area being bigger than swap chain. Both were created with same dimensions though, but strangely the window was created bigger than the size given. There is a visual studio setting (project -> properties -> manifest tool -> input and output -> DPI Awareness), which takes into account screen enlargement (which is useful for enlarging for instance fonts on high resolution screens). Setting this to "High DPI Aware" resulted in creating the window of exact same size as the size given. But it's not the window size, what must match the swap chain size, but the window client area, so the area of window in which the back buffer is rendered (regular window size includes also areas reserved for windows buttons, bars, and borders, which in this case can't be taken into account). Once the window client area was matching the swap chain size, the rendered image showed up. Would have searched that for ages without you guys. Thanks!
  4. Thanks for the answers.   The debug layer is on and the only message it shows is when creating device, context and swap chain with D3D11CreateDeviceAndSwapChain, and it says: but the HRESULT returned is S_OK.   ASUS says my intel's graphic processor drivers are up to date. I know that there may be new intel drivers, but laptop should have its providers drivers installed, and ASUS doesn't have any new drivers on their website for intel graphic card.   I create a window with 640x480, but somehow its client area is bigger. Will work on that one and provide results as soon as I have any.
  5. Hi, I have a problem with my directx application. I have a laptop with 2 cards (intel and gtx 860M). When I render with intel, I have ok results, but if I render with geforce I get a white window, as if 3d context was never applied to an opened window. I got this problem after installing new geforce drivers. Previously it was working fine with both cards. Is there anything I need to adjust in my application? I did a clean driver install, tried different drivers and still no results. Thanks!
  6. HLSL switch attributes

    Ok, I now see the ideas behind both options. Thanks for the help!
  7. Hi,   Can anyone give a little more details, than on MSDN, on what the below HLSL "switch" statement attributes do:   forcecase - ? call - ?   Example: [forcecase] switch(exp) { case 0: break; case 1: break; } [call] switch(exp) { case 0: break; case 1: break; } Many thanks!
  8. Yes, you're right. I use CSSetUnorderedAccessViews and not CSSetShaderResources.   Yes, you're right. That was a problem of an unbound resource at first compute shader, and yes debug layer reports that. I didn't look at it.   I'm such a noob. Sorry and thanks!
  9. Hi,   I have a small question. In my code I use a compute shader writing to a buffer, and right after it, another compute shader reading from the buffer.   The problem is that in the second compute shader, when I read using SRV and StructuredBuffer I get zeros, but if I read using UAV and RWStructuredBuffer I get correct values. Here's the code. Take a look at "Worlds" buffer.   CPP uint32_t _InitCounts = -1; Context->CSSetShader( CS1, 0, 0 ); Context->CSSetUnorderedAccessViews( 0, 1, &PositionsUAV, &_InitCounts ); Context->CSSetUnorderedAccessViews( 1, 1, &VelocitiesUAV, &_InitCounts ); Context->CSSetUnorderedAccessViews( 2, 1, &WorldsUAV, &_InitCounts ); Context->Dispatch( 1, 1, 1 ); Context->CSSetShader( CS2, 0, 0 ); Context->CSSetShaderResources( 0, 1, &WorldsSRV ); // Context->CSSetShaderResources( 1, 1, &WorldsUAV, &_InitCounts ); <-- alternative, working case Context->CSSetConstantBuffers( 0, 1, &ViewProjection ); Context->CSSetUnorderedAccessViews( 0, 1, &WVPsUAV, &_InitCounts ); Context->Dispatch( 1, 1, 1 ); CS1 RWStructuredBuffer<float3> Positions : register( u0 ); RWStructuredBuffer<float3> Velocities : register( u1 ); RWStructuredBuffer<float4x4> Worlds : register( u2 ); [ numthreads( 32, 1, 1 ) ] void Main( uint3 Index : SV_DispatchThreadID ) { Positions[ Index.x ] += Velocities[ Index.x ]; Velocities[ Index.x ] = float3( 0, 0, 0 ); Worlds[ Index.x ] = float4x4 ( 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, Positions[ Index.x ].x, Positions[ Index.x ].y, Positions[ Index.x ].z, 1 ); CS2 cbuffer View : register( b0 ) { float4x4 ViewProjection; } StructuredBuffer<float4x4> Worlds : register( t0 ); // RWStructuredBuffer<float4x4> Worlds : register( u1 ); <-- alternative, working case RWStructuredBuffer<float4x4> WVPs : register( u0 ); [ numthreads( 128, 1, 1 ) ] void Main( uint3 Index : SV_DispatchThreadID ) { WVPs[ Index.x ] = mul( Worlds[ Index.x ], ViewProjection ); // Worlds[ Index.x ] returns zeros!!! }
  10. MJP, that's exactly what I was looking for. Thanks a lot!!!
  11. Hi, thanks for your responds.   I have separate files for each shader, mostly because I can then debug shaders in VS Graphics Analyzer nicely. Anyway, my "buffer_srv" is not a constant buffer, it's just a regular buffer - vertex buffer, non-vertex buffer, it doesn't matter, it's an array of values, bound to one CS as uav and to another CS as srv.
  12. Hi, I have 2 questions, that I haven't found answers for in the internet. Maybe someone could help.   1) On MSDN I read: HLSL registers t - for texture and texture buffer, c - for buffer offset Normally I used c for "Buffer<>" and t for "Texture2D", but I've found that if I use t for "Buffer<>" for instance, it compiles and works. My question - what are the different register types for, then?   2) I bound two SRVs, buffer_srv to slot 0 and texture_srv to slot 0. In hlsl I had: Buffer<float4> buffer : register(c0); Texture2D tex : register(t0); Well it didn't work, so I've changed it, and I bind buffer_srv to slot 0 and texture_srv to slot 1. In hlsl I have: Buffer<float4> buffer : register(c0); Texture2D tex : register(t1); and now it works. I thought different registers have different counters, but now I see it's rather one counter per shader per srv type (UAVs have different counter etc.). Question - can someone explain the input slot-register relation or refer me to some online materials, please?   Thanks!  
  13. Had the same problem. I broke down the solution here: http://www.gamedev.net/topic/665114-pass-matrix-to-shader-problem/
  14. Ok, I've got it. The problem lies within operations on float4x4 when column-majority packing is on. Simply all the float4x4 operations then assume that columns (not rows) are stored next to each other in memory. Therefore:   this #pragma pack_matrix(column_major) float4x4 mtx = float4x4(val0, val1, val2, val3); is stored in memory like that mtx val0.x val1.x val2.x val3.x val0.y val1.y val2.y val3.y val0.z val1.z val2.z val3.z val0.w val1.w val2.w val3.w and this #pragma pack_matrix(row_major) float4x4 mtx = float4x4(val0, val1, val2, val3); is stored in memory like that mtx val0.x val0.y val0.z val0.w val1.x val1.y val1.z val1.w val2.x val2.y val2.z val2.w val3.x val3.y val3.z val3.w Another example of an operation that takes into account the above rule is operator[]. // mtx (float4x4) // 00 01 02 03 // 10 11 12 13 // 20 21 22 23 // 30 31 32 33 // memory // 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 #pragma pack_matrix(row_major) // get third row in a row-major configuration mtx[2]; // row-major is on, so it's instructed that row values are next to each other in memory, so it takes {20 21 22 23} #pragma pack_matrix(column_major) // get third row in a column-major configuration mtx[2]; // column-major is on, so it's instructed that row values are every 4th value in memory, so it takes {02 12 22 32}
  15. Can anyone explain the below behavior?   having: D3D11_INPUT_ELEMENT_DESC _desc[] = {     {"VERT_COORD", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},     {"WVP", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1},     {"WVP", 1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1},     {"WVP", 2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1},     {"WVP", 3, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1}, }; when I: // (vertex shader) float4 main(float4 _vert : VERT_COORD, float4x4 _wvp : WVP) : SV_Position {...} "_wvp" matrix is transposed (I do it on the CPU side). But if I: // (vertex shader) float4 main(float4 _vert : VERT_COORD, float4 _wvp0 : WVP0, float4 _wvp1 : WVP1, float4 _wvp2 : WVP2, float4 _wvp3 : WVP3) : SV_Position {     float4x4 _wvp = float4x4(_wvp0, _wvp1, _wvp2, _wvp3);          // or          float4x4 _wvp = float4x4(         _wvp0.x, _wvp0.y, _wvp0.z, _wvp0.w,         _wvp1.x, _wvp1.y, _wvp1.z, _wvp1.w,         _wvp2.x, _wvp2.y, _wvp2.z, _wvp2.w,         _wvp3.x, _wvp3.y, _wvp3.z, _wvp3.w     );     ... } "_wvp" is now not transposed.