• Content count

  • Joined

  • Last visited

Community Reputation

148 Neutral

About joystick-hero

  • Rank
  1. Metric used for memory bandwidth

    Thanks for your reply. Yes, it is my own profiler. According to the values that is returning it seems to be working just fine. If I take 1GB == 2^30 as you said then I reach peaks of 143GB/s or so and makes total sense. But I could be doing it wrong anyway, here's how I do it: I use the effective bandwidth equation presented here only that I will now divide by 2^30. To measure the time I use the QueryPerformanceCounter function but only after telling the GPU to finish all the works in its internal queue with a D3D11_QUERY_EVENT.   In my shaders each thread accesses a unique memory location only once, and writes at a unique memory location, so I think the cache is not an issue?
  2. Hi guys. I've developed a DirectX application and I need to measure the performance achieved in several scenarios. One question that I really need to answer and wasn't able to find anywhere is about the metrics used for the GPU's memory bandwidth. For instance, I own a Gigabyte Geforce GTX 660 and in the technical specifications that I've found say the memory bandwidth is 144.2 GB/S but my question is: in this case, 1 GB = 100,000,000 bytes or 1GB = 2^30 bytes ? I thought it was the former, but my profiler says that my application reachs maximum speeds of 155.83 GB/S. My profiler could be wrong too. That's why I would like to know if I should change the metric used for the performance calculation or if somehow my gigabyte geforce gtx 660 is better than I hoped or if my profiler needs to be checked. I really hope it's not the last option hehe.   And regarding the GFLOPS metric, I have the same question, 1GFLOP = 10^9 flops  or 2^30?     Regards.  
  3. I guess.. that's a relief pals haha. Thank you for your answers.
  4. This might be a newbie question but I was wondering if there's a way or some kind of convention regarding resource's allocation alignment in memory.   For instance, when I create a new texture2d via ID3D11Device::CreateTexture2D method with a DXGI_FORMAT_R32G32B32A32_FLOAT format, how can I tell directx that I want the resource to have a X-byte alignment?   I'm interested in this because when and if I need to read the texture2d in a compute shader, in order to do it the most efficient way possible (when memory coalescing works), the starting address of the resource must be a multiple of the region size I'm reading. According to page number 9 of this pdf slide
  5. Thanks MJP. I guess that simple-why-I-didn't-think-about-that solution solves all my problems. Gonna try that. Now I'm glad I don't have a picture of me in my profile pic. So there are no performance problems with my original code I suppose? Only it's painfully ugly and not scalable at all xD
  6. The textures's contents are the result of previous scene renders from several viewpoints. I don't know if there's an easy way to instruct DirectX to render to some Texture2D area as opposed to an entire one :c
  7. First of all thanks for your answer and help. I didn't think about the syntactic sugar possibility.     The problem is I kinda needed an array of Texture2DArrays because I need to bind 10240  Texture2D's (64x64 dimension) to this ComputeShader and it greatly exceeds the 2048 length limit for Texture2DArray's. The shader runs faster the more data you give it to process, unless you run out of memory but that's not the case so far xD.  And this was the only idea I had to sort out this problem, well this and TextureCubeArray's maybe? I don't know how those work tho and I had all my Texture2dArray C++ code in place. I think the right syntax is tmp = gTextureArray.Load(int4(x,y, arrayIndex, mip)).rgb; ? Unless the documentation is really confusing :c       Aside from the fact that it doesn't look much pretty, are there any performance reasons to not use some waterfalled code? Because if the threads in the warps don't diverge I didn't think about another bad consequences. Again thanks for your help!
  8. Let's say I have a compute shader that retrieves data from a Texture2DArray using the Id of the group like this: Texture2DArray<float4> gTextureArray[2]; [numthreads(32, 1, 1)] void Kernel(uint3 GroupID : SV_GroupID, uint3 GroupThreadID : SV_GroupThreadID) { float3 tmp = gTextureArray[GroupID.x].Load(int4(GroupThreadID.x,GroupThreadID.x,0,0)).rgb; .... } And let's say I launch it like this   deviceContext->Dispatch(2, 0, 0);   So, 2 groups, 32 threads each that read pixel values from a Texture2DArray. All the threads in GroupID.x = 0 will read values from gTextureArray[0] and all the threads in GroupID.y = 0 will read values from gTextureArray[1]. It turns out I can't compile that simple code, instead I get this compile error (cs_5_0)   error X3512: sampler array index must be a literal expression   Now, I know I can do this instead: Texture2DArray<float4> gTextureArray[2]; [numthreads(32, 1, 1)] void Kernel(uint3 GroupID : SV_GroupID, uint3 GroupThreadID : SV_GroupThreadID) { float3 tmp = float3(0,0,0); if(GroupID.x == 0) tmp = gTextureArray[0].Load(int4(GroupThreadID.x,GroupThreadID.x,0,0)).rgb; else if(GroupID.x == 1) tmp = gTextureArray[1].Load(int4(GroupThreadID.x,GroupThreadID.x,0,0)).rgb; .... } Or use a switch in case I have lots of groups so it doesn't look that much awful (it still does) Notice how there is no warp divergence since all threads in each group will go one branch or the other. My question is, am I missing something here? Why does HLSL not support that kind of indexing since I can not see any divergence or other problems, at least in this case?
  9. Hi guys. I have a quick HLSL question. I have the following code: [CODE] cbuffer cbLights { Light gLight[MAX_NUMBER_LIGHTS]; matrix gLightWVP[MAX_NUMBER_LIGHTS]; } [/CODE] And I want to be able to update gLight[1] individually from my c++ d3d11 application . How can I do that? I know how to do it if gLight was a single variable and not an array but this is new to me. BTW: I'm using the effects framework provided by Microsoft in D3dx11effect.h Regards!
  10. Passing a const COM interface as parameter

    Hey thanks so much for your answer. Yes I see what you mean, you're right. I'll take the first approach too lol. Maybe using friend classes instead of getters. Regards
  11. Hi guys. Yes, it's a really newbie question hehe. [img][/img] But I'd like to know if theres some way to pass (for example) a ID3D10ShaderResourceView * variable as a parameter to another function and make sure that particular function is not gonna change it in any way. I have a class who has a ID3D10ShaderResourceView * variable holding depth values to perform shadow calculations, a shadow map so to speak. And another class who makes the actual rendering, so I'd need to pass this shadow map to the rendering class but in doing so I guess I'd be violating the encapsulation principle since the rendering class could do whatever he likes with my shadow map. Do I have design issues, could I do this in another way or is there some way to fix it or am I getting worried for too little? Regards