Jump to content
  • Advertisement

fries

Member
  • Content Count

    43
  • Joined

  • Last visited

Community Reputation

389 Neutral

About fries

  • Rank
    Member

Personal Information

  • Role
    Programmer
  • Interests
    Programming

Social

  • Twitter
    @LukeMamacos
  1. fries

    SSAO black dots

    Is it just me or do I also see the artifact in the bi-normal and tangent images?
  2. The work group size will be defined in the compute shader itself, something like this: layout (local_size_x = 8, local_size_y = 8, local_size_z = 1) in; Then the dispatch call from C/C++ has this prototype: void glDispatchCompute(GLuint num_groups_x, GLuint num_groups_y, GLuint num_groups_z); So when you call the dispatch function, it will make 'num_groups_x * num_groups_y * num_groups_z' groups, and the size of the group (number of threads) is 'local_size_x * local_size_y * local_size_z'.
  3. Ah! I found it... InterlockedMax(IterationDataWrite[0], gNumAppendNodeViews[0]); That value is never set to 0, so it just keeps maxing and maxing >.> <.<
  4. It seems that after about 16 iterations, the value that it writes into IterationDataWrite[0] is 252 but the OutNodeViewBuffer count is only 228... These values should be the same...
  5. Hi, I'm having problems with a compute shader. I am trying to iteratively traverse a tree structure with multiple Dispatch calls. Each thread reads in a node from an input StructuredBuffer, and using two "counting" RWStructuredBuffers as outputs, each thread either pushes the children of the current node into the output node buffer for the next iteration, or into a triangle output buffer to draw a triangle. Looking in RenderDoc, everything works fine for the first few iterations, but then some odd stuff starts happening: The triangle buffer starts to get smaller, even though the only operation is to add triangles to it, and I don't reset the count between iterations. Then after a few more iterations RenderDoc cant even keep up with what's going on: It starts reporting random counts for the triangle buffer - every time you view the pipeline state for a particular dispatch, the buffer counts are a new random number. Does anyone have some ideas about what I'm doing wrong? struct sGPUNodeView { uint mNodeIndex; uint mViewIndex; uint mObjectInstanceIndex; uint mPadding; }; struct sRingData { uint mReadIndex; uint mWriteIndex; uint mPadding[2]; }; struct sGPUPackedMeshBVHNode { float4 mSphere; uint4 mData; bool HasObjectInstance() { return mData.x & 1; } bool HasTriangle() { return mData.x & 2; } uint3 GetTriangleIndices() { return mData.yzw; } uint2 GetChildIndices() { return mData.yz; } uint GetObjectInstance() { return mData.w; } float4 GetSphere() { return mSphere; } }; struct sPointSplatData { float3 mPoisition; uint mViewIndex; uint mObjectInstanceIndex; uint mPadding[3]; }; struct sTriangleSplatData { uint3 mIndices; uint mViewIndex; uint mObjectInstanceIndex; uint mPadding[3]; }; StructuredBuffer<sGPUNodeView> NodeViewBuffer; // Input Node-View Buffer globallycoherent RWStructuredBuffer<sGPUNodeView> OutNodeViewBuffer; // Output Node-View Buffer StructuredBuffer<sGPUPackedMeshBVHNode> BVHNodeBuffer; // BVH Node Buffer StructuredBuffer<float4x4> ViewBuffer; globallycoherent RWStructuredBuffer<sPointSplatData> PointSplatBuffer; // Output points, not really used at the moment globallycoherent RWStructuredBuffer<sTriangleSplatData> TriangleSplatBuffer; // Output Triangles globallycoherent RWBuffer<uint> IndirectDispatchArgs; // Indirect args for the next dispatch Buffer<uint> IterationDataRead; // Stores the number of nodes to process for the current iteration RWBuffer<uint> IterationDataWrite; // Stores the number of nodes to process for the next iteration static const uint gTotalNumThreads = 64; groupshared uint gNumAppendNodeViews[gTotalNumThreads]; // Used in parallel reduction to figure out how many items were added to OutNodeViewBuffer // Append item to OutNodeViewBuffer uint AppendNodeView(uint NodeIndex, uint ViewIndex, uint ObjectInstanceIndex) { uint index = 0; if (NodeIndex != 0xffffffff) { index = OutNodeViewBuffer.IncrementCounter(); sGPUNodeView nodeView = (sGPUNodeView)0; nodeView.mNodeIndex = NodeIndex; nodeView.mViewIndex = ViewIndex; nodeView.mObjectInstanceIndex = ObjectInstanceIndex; OutNodeViewBuffer[index] = nodeView; // Needs to output the number, not the index. index++; } return index; } // No culling yet - should do frustum culling here bool ShouldCullNodeView(sGPUNodeView nodeView, sGPUPackedMeshBVHNode node, float4x4 view) { return false; } // This appends the node to either the point splat output buffer or the triangle output buffer // Point splats are not used yet void DrawNodeView(sGPUNodeView nodeView, sGPUPackedMeshBVHNode node, float4x4 view) { if (node.HasTriangle()) { uint index = TriangleSplatBuffer.IncrementCounter(); sTriangleSplatData splat = (sTriangleSplatData)0; splat.mIndices = node.GetTriangleIndices(); splat.mViewIndex = nodeView.mViewIndex; splat.mObjectInstanceIndex = nodeView.mObjectInstanceIndex; TriangleSplatBuffer[index] = splat; } else { uint index = PointSplatBuffer.IncrementCounter(); sPointSplatData splat = (sPointSplatData)0; splat.mPoisition = node.GetSphere(); splat.mViewIndex = nodeView.mViewIndex; splat.mObjectInstanceIndex = nodeView.mObjectInstanceIndex; PointSplatBuffer[index] = splat; } } // Approximate the node's screen space size, not used yet, return large number so // no point splats occur. float ApproximateProjectedSize(float4 Sphere, float4x4 View) { return 5; } // Should this Node-View be drawn or further processed? bool ShouldDrawNodeView(sGPUNodeView nodeView, sGPUPackedMeshBVHNode node, float4x4 view) { // Nodes with triangles must be drawn if (node.HasTriangle()) { return true; } else { // If the node is too small, draw it and stop processing it's children float approximateProjectedSize = ApproximateProjectedSize(node.GetSphere(), view); return approximateProjectedSize <= 1.0f; } } [numthreads(gTotalNumThreads, 1, 1)] void ProcessNodeViews_main(uint3 dispatchThreadID : SV_DispatchThreadID, uint GroupIndex : SV_GroupIndex, uint GroupID : SV_GroupID) { gNumAppendNodeViews[GroupIndex] = 0; // Dont process more nodes than what is in the node-view buffer (IterationDataRead[0]) if (dispatchThreadID.x < IterationDataRead[0]) { sGPUNodeView nodeView = NodeViewBuffer[dispatchThreadID.x]; if (nodeView.mViewIndex != 0xffffffff) // this would be an error condition, and shouldnt really fail { // Retreive the node and view sGPUPackedMeshBVHNode node = BVHNodeBuffer[nodeView.mNodeIndex]; float4x4 view = ViewBuffer[nodeView.mViewIndex]; if (!ShouldCullNodeView(nodeView, node, view)) { if (ShouldDrawNodeView(nodeView, node, view)) { DrawNodeView(nodeView, node, view); } else { if (node.HasObjectInstance()) // This should never be true for now { nodeView.mObjectInstanceIndex = node.GetObjectInstance(); } // Append the child node-views to the OutNodeViewBuffer uint2 children = node.GetChildIndices(); uint maxNodes1 = AppendNodeView(children.x, nodeView.mViewIndex, nodeView.mObjectInstanceIndex); uint maxNodes2 = AppendNodeView(children.y, nodeView.mViewIndex, nodeView.mObjectInstanceIndex); // Store the maximum item number that is stored in the OutNodeViewBuffer gNumAppendNodeViews[GroupIndex] = max(maxNodes1, maxNodes2); } } } } // Parallel reduction of maximum item number in OutNodeViewBuffer [unroll(gTotalNumThreads)] for(uint s = gTotalNumThreads / 2; s > 0; s >>= 1) { if(GroupIndex < s) { gNumAppendNodeViews[GroupIndex] = max(gNumAppendNodeViews[GroupIndex], gNumAppendNodeViews[GroupIndex + s]); } GroupMemoryBarrierWithGroupSync(); } // Have the first thread write out the dispatch args, and number of nodes for the next iteration if(GroupIndex == 0) { InterlockedMax(IndirectDispatchArgs[0], (gNumAppendNodeViews[0] + 63) >> 6); InterlockedMax(IterationDataWrite[0], gNumAppendNodeViews[0]); } }
  6. Hi, I have a simple traversal algorithm that I want to implement in a compute shader, it looks something like this: while (!isEmpty(itemList)) { a = itemList.Pop(); results = Process(a); for each r in results: itemList.Push(r); } My initial though was to do it with multiple passes/Dispatch calls - each one creating a new list of items and swapping the lists between dispatch's. the problem is there is no way to know when you're done with out a read back. But then I thought I could possibly use a RWStructuredBuffer as a circular buffer to implement a processing queue, and have atomic operations that increment the read/write position to the buffer. This way I could do it with a single large dispatch. Has anyone done something like this? Are there any references/tips/ideas you could give me? Thanks! fries
  7. fries

    ManyLODs algorithm

    Yeah, i should have phrased it differently, as usually for x levels you have x iterations which is also a "set number per frame". I mean like 3 or 4 iterations.
  8. fries

    ManyLODs algorithm

    Yeah. In section 3.2 of the paper they deacribe the "Incremental Approach". With this they never restart from the root anyway. So my idea would rely on that doing the refining in subsequent frames.   I did think about using a higher branching tree like a an octree for compression reasons but i think it would possibly cause the shader to be too unbalanced, doing uneven work in different branches, which is required for high throughput.... What do you think?
  9. fries

    ManyLODs algorithm

    I was thinking one solution would be to have a set number of iterations per frame, and the shader will draw everything that is still active on the last iteration. This way it would at least draw something at a lower lod on one frame and then refine it on the next. 
  10. fries

    ManyLODs algorithm

    Ah yes, that makes sense. So which do you think would be more efficient? Compute or Geom shader?   It seems pretty simple to implement as a geom shader and I think a compute shader would be slightly more complex.   One thing I am wondering about though, is how to know when to stop iterating? The active set will be empty, but do i need to read that back to the cpu to know? seems like it would introduce a lot of latency...
  11. fries

    ManyLODs algorithm

    I'm not so sure about that. D3D11 came out 2008 and the paper came out 2011.
  12. Hi,   I was having a go at implementing the ManyLODs algorithm from this paper http://perso.telecom-paristech.fr/~boubek/papers/ManyLoDs/ManyLoDs.pdf   And was wondering if anyone knows why they choose to use the geometry stream output instead of doing it all in a compute shader?   Thanks
  13. Apart from some detail about why the weights need to sum to 4pi, you've said the same thing that I have.
  14. Hey,   The reason you need to normalize is because the sum of the weights (fWtSum) will probably not be exactly 4pi, like it should be (possibly due to inaccuracies in the equation used to calculate the weight). Multiplying the result by 4pi/fWtSum makes sure that your SH does not have more or less energy than it should.   The implementation you present from "physically based rendering" looks incorrect to me, so going with Stupid Spherical Harmonics, 1. float f[],s[]; 2. float fWtSum = 0; 3. Foreach(cube map face) 4. Foreach(texel) 5. float fTmp = 1 + u^2 + v^2; 6. float fWt = 4 / (sqrt(fTmp) * fTmp); 7. EvalSHBasis(texel, s); 8. f += t(texel) * fWt * s; // vector 9. fWtSum += fWt; 10. f *= 4 * Pi / fWtSum; // area of sphere On line 6, the weight of the texel is calculated, this is an approximation of it's solid angle, but not entirely accurate. On line 8, each sample/texel is multiplied with its weight, and on line 9, the weights are summed. On line 10, the result is normalized by multiplying by 4pi and dividing by the summed weights. Since the summed weight should be equal to 4pi, multiplying by 4pi/fWtSum (which should be close to 1, since fWtSum should be close to 4pi) will correct the bias introduced by the inaccuracies in the calculated solid angles.   Good luck.
  15. fries

    Particles in idTech 666

    There is also the bitsquid presentation on particle lighting which might interest you: http://roxlu.com/downloads/scholar/008.rendering.practical_particle_lighting.pdf
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!