• Advertisement

gamer9xxx

Member
  • Content count

    6
  • Joined

  • Last visited

Community Reputation

0 Neutral

About gamer9xxx

  • Rank
    Newbie

Personal Information

  • Interests
    Programming
  1. Hi guys, I'm writing a simple Compute Shader in DirectX11, shader model 5, trying to store float4 color into a groupshared memory per thread, then read it back. From my understanding of MSDN, the instruction store_structured (the same applies for ld_structured), can write 4 x 32bit components at once. Therefore I would expect one float4 write, translates into one store_structured instruction. However in my simple shader it translates into 4 store_structured instructions! Code: groupshared float4 ColorQuad[4][4][64]; ... ColorQuad[x][y][shared_index] = float4(0.1f, 0.2f, 0.3f, 0.4f); ... float4 color = ColorQuad[x][y][shared_index]; This code is compiled into this: dcl_tgsm_structured g0, 4096, 4 ... mov r1.x, r0.w imul null, r1.y, r0.x, l(16) imad r1.y, r0.z, l(1024), r1.y store_structured g0.x, r1.x, r1.y, l(0.100000) // ColorQuad<0> iadd r1.z, r1.y, l(4) store_structured g0.x, r1.x, r1.z, l(0.200000) // ColorQuad<0> iadd r1.z, r1.y, l(8) store_structured g0.x, r1.x, r1.z, l(0.300000) // ColorQuad<0> iadd r1.y, r1.y, l(12) store_structured g0.x, r1.x, r1.y, l(0.400000) // ColorQuad<0> ... imad r1.y, r0.z, l(1024), r1.y ld_structured r2.x, r1.x, r1.y, g0.xxxx // ColorQuad<0:Inf> iadd r1.z, r1.y, l(4) ld_structured r2.y, r1.x, r1.z, g0.xxxx // ColorQuad<1:Inf> iadd r1.z, r1.y, l(8) ld_structured r2.z, r1.x, r1.z, g0.xxxx // ColorQuad<2:Inf> iadd r1.y, r1.y, l(12) ld_structured r2.w, r1.x, r1.y, g0.xxxx // ColorQuad<3:Inf> Now I'm very confused, why this is happening. Am I just understanding it wrong, the MSDN actually says it can write 1x32bit / 4x8bit of data? Or bank conflict compiler optimization? Thanks for any explanation!
  2. It seems you are right. My cbuffer looks as you wrote. cbuffer GILights : register(b2) { float2x4 GIColorViewPosition[64]; }; But when I change it to the float4x2, the problem is when I try to read this: float4 color = GIColorViewPosition[ i ][ 0 ]; the compiler complains, it cannot convert float2 to float4, perhaps it's related to the fact I compile the shader with D3DCOMPILE_PACK_MATRIX_ROW_MAJOR. Is it really that, this flag packs not just matrix type, but all the float#x# types and all related int, bool, etc versions of this type? When I store lights in RWStructuredBuffer<float4x2> then read them from RWStructuredBuffer<float2x4>, I will get exactly the same broken image, so it must be the problem you just described.
  3. Hi guys, is it possible to copy from RWStructuredBuffer<float2x4> to a cbuffer of the same size using CopyResource function? According MSDN if size, format, etc is the same, it should work. There is a note "You can't use an Immutable resource as a destination." - I guess by immutable they mean D3D11_USAGE_IMMUTABLE, so I used radher D3D11_USAGE_DEFAULT. the RWStructuredBuffer<float2x4> is created as this: D3D11_BUFFER_DESC desc; desc.ByteWidth = 2048; //64 lights * size of float2x4 desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS; desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; desc.StructureByteStride = 32; //size of float2x4 desc.Usage = D3D11_USAGE_DEFAULT; hr = m_p_device->CreateBuffer(&desc, 0, &sourceBuffer); D3D11_UNORDERED_ACCESS_VIEW_DESC uavd; uavd.ViewDimension = D3D11_UAV_DIMENSION_BUFFER; uavd.Format = DXGI_FORMAT_UNKNOWN; uavd.Buffer.NumElements = 64; hr = m_p_device->CreateUnorderedAccessView(sourceBuffer, &uavd, &sourceBufferView); // generating 64 lights and store them in the sourceBuffer then the cbuffer is created as this: D3D11_BUFFER_DESC desc; desc.ByteWidth = 2048; //64 lights * size of float2x4 desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER; desc.Usage = D3D11_USAGE_DEFAULT; hr = m_p_device->CreateBuffer(&desc, 0, &destinationBuffer); then the copy is done via deferred context: m_p_deferred_context->CopyResource(destinationBuffer, sourceBuffer); // call the final lighting shader In my lighting shader, I have 64 lights, float4 for color, float4 for position in view space, therefore float2x4. The colors and positions of the lights are generated in another shader on the fly, so I store them in RWStructuredBuffer<float2x4>. Then in my final lighting shader, I have to read all 64 lights per pixel, so I could just read the data again from RWStructuredBuffer<float2x4>. However, since I'm doing tons of other texture reading, I think it totally breaks the texture cache, because I get a huge fps drop. So I tried to move the RWStructuredBuffer<float2x4> data into a cbuffer and I got almost double performance. The problem is, it appears that the data layout of these buffers is somehow different. For debuging, I divided the screen into 8x8=64 squares and every square displayes a color of the light from the RWStructuredBuffer<float2x4>; If I read it as RWStructuredBuffer<float2x4>, everything is correct a few red, green and white lights: However if I read it now from the copied cbuffer, I got this, the color channels are somehow messed up. Obviously, some data was copied and even the pattern was preserved: Any idea, what could happend, how to do it correctly? I could just do Map/Unmap, but since it's a deferred context, it's a bit tricky, moreover, I'd like to avoid any CPU communication and another staging buffer, so I'd like to just use CopyResource. Thanks.
  4. Hi, I'd like to implment bloom with a different amout of blurring per object. All the tutorials about bloom I found are global, they just render all the blooming objects in one RT, apply the same amount of blur to them and add to the original image. Let's say I have a laser and a pistol, but the laser makes much more light than the pistol when shooting, so I want the laser to have much stronger bloom effect. If I have lot of these different glowing objects, what is a good way to solve this problem? Since the bloom effect can be quite large I'd like to do the bluring not in single/vert/horz pass, but iteratively via Kawase Bloom. I was thinking about smth. like this.: 1. render objects and write the bloom strength in a stencil buffer (let's say strength can be integer from 1 to 10) 2. blur pixel with the Kawase 4 pixel kernels, but use only pixels that have stencil bigger than 1 3. write blurred value and find maximum stencil strength of the pixels used in kernel, decrease it by 1 and write to stencil 4. repeat from step 2, until some max limit (let's say 10) But I still see some potential issues, not sure if this would work, does anybody have an idea how to do different amount of glow per object? I don't care about the correctnes, I'm more interested in performance.
  5. Pathfinding Eikonal vs Grass Fire Algorithm

    Hi, thx for the explanation, yes it was just a confusion what eikonal equation is, since they added the whitepaper that describes how to solve it with the Fast Iterative Method, but apparently they didn't use it at all, but what they wrote in your quoted part, it's simple "brushfire" alg.
  6. Hi, I was reading Game AI Pro how they implemented Supreme Commander path finding and came to one question. For the integration of cost field they are using Eikonal equation for traversing the areas. They recommended Fast Iterative Method and it's parallel version for solving this equation. However in all basic tutorials of flow field navigation that I found for integrating of the cost field is used simple grass/brush fire algorithm. My question is what would be the difference if they used in the game just grass fire algorithm? I guess the reason was some quality/performance trade of. Fortunately the Fast Iterative Method is very easy to implement so I can compare the performance, but what I don't understand is, what is the "quality" benefit.
  • Advertisement