• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jiraya
      For a 2D game, does using a float2 for position increases performance in any way?
      I know that in the end the vertex shader will have to return a float4 anyway, but does using a float2 decreases the amount of data that will have to be sent from the CPU to the GPU?
       
    • By ucfchuck
      I am feeding in 16 bit unsigned integer data to process in a compute shader and i need to get a standard deviation.
      So I read in a series of samples and push them into float arrays
      float vals1[9], vals2[9], vals3[9], vals4[9]; int x = 0,y=0; for ( x = 0; x < 3; x++) { for (y = 0; y < 3; y++) { vals1[3 * x + y] = (float) (asuint(Input1[threadID.xy + int2(x - 1, y - 1)].x)); vals2[3 * x + y] = (float) (asuint(Input2[threadID.xy + int2(x - 1, y - 1)].x)); vals3[3 * x + y] = (float) (asuint(Input3[threadID.xy + int2(x - 1, y - 1)].x)); vals4[3 * x + y] = (float) (asuint(Input4[threadID.xy + int2(x - 1, y - 1)].x)); } } I can send these values out directly and the data is as expected

                             
      Output1[threadID.xy] = (uint) (vals1[4] ); Output2[threadID.xy] = (uint) (vals2[4] ); Output3[threadID.xy] = (uint) (vals3[4] ); Output4[threadID.xy] = (uint) (vals4[4] ); however if i do anything to that data it is destroyed.
      If i add a
      vals1[4] = vals1[4]/2; 
      or a
      vals1[4] = vals[1]-vals[4];
      the data is gone and everything comes back 0.
       
       
      How does one go about converting a uint to a float and performing operations on it and then converting back to a rounded uint?
    • By fs1
      I have been trying to see how the ID3DInclude, and how its methods Open and Close work.
      I would like to add a custom path for the D3DCompile function to search for some of my includes.
      I have not found any working example. Could someone point me on how to implement these functions? I would like D3DCompile to look at a custom C:\Folder path for some of the include files.
      Thanks
    • By stale
      I'm continuing to learn more about terrain rendering, and so far I've managed to load in a heightmap and render it as a tessellated wireframe (following Frank Luna's DX11 book). However, I'm getting some really weird behavior where a large section of the wireframe is being rendered with a yellow color, even though my pixel shader is hard coded to output white. 

      The parts of the mesh that are discolored changes as well, as pictured below (mesh is being clipped by far plane).

      Here is my pixel shader. As mentioned, I simply hard code it to output white:
      float PS(DOUT pin) : SV_Target { return float4(1.0f, 1.0f, 1.0f, 1.0f); } I'm completely lost on what could be causing this, so any help in the right direction would be greatly appreciated. If I can help by providing more information please let me know.
    • By evelyn4you
      Hello,
      i try to implement voxel cone tracing in my game engine.
      I have read many publications about this, but some crucial portions are still not clear to me.
      At first step i try to emplement the easiest "poor mans" method
      a.  my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo)
      b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure
      c. every voxel does have the same color for every side ( top, bottom, front .. )
      d.  one directional light injects light to the voxels ( another stuctured buffer )
      I will try to say what i think is correct ( please correct me )
      GI lighting a given vertecie  in a ideal method
      A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie
      B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.
      C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays
      Voxel GI lighting
      In priciple we want to do the same thing with our voxel structure.
      Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels.
      Saving time for weighted summing up of colors of each voxel
      To save the time for weighted summing up of colors of each voxel we build bricks or clusters.
      Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ).
      The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation.

      After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color.
      Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture.
      Cone tracing, howto ??
      Here my understanding is confus ?? How is the voxel structure efficiently traced.
      I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample.
      Supposed,  i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone
      - i would see some single voxels near or far
      - i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded
      How do i make a weighted sum of this ligting area ??
      e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded.
      Please be patient with me, i really try to understand but maybe i need some more explanation than others
      best regards evelyn
       
       
  • Advertisement
  • Advertisement
Sign in to follow this  

DX11 Concurrent fragments writes

This topic is 2226 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a rather low-level GPU question.

Is it possible, that two or more fragments (perhaps from different primitives) with the SAME xy target (screen) coordinates are ever being executed concurrently by multiple GPU threads?

It sounds weird, let me rephrase it just to be sure smile.png

Say I submit 2 triangles that will project to the very same target xy pixels (they might or might not have different z (depth)).
1. For each of the 6 vertices a vertex shader will be invoked and all these 6 vertex shaders run at once (there's plenty of groups and units on the GPU, right?)
2. There is no geometry shader (irrelevant). Both triangles will project to the same pixels of the target. Rasteriser rasterises and pixel shaders get invoked...
3. Is there a possibility that two pixel shaders (one for a fragment on triangle 0 and one for another fragment on triangle 1), which want to shade (and possibly write or scatter) at the very same target (screen) location, get executed really concurrently (obviously on different thread groups)???

I suppose yes. I am obviously looking for the worst-case scenario. Am I right? Early-Z, presence of discards and explicit depth-writes and similar peculiarities probably have a say in this.

Next question would be (DX11/GL4) if it is necessary to have ANY output target bound to the output merger, or is it enough to bind 0 RTVs and 1 UAV (OMSetRenderTargetsAndUnorderedAccessViews()). I'm not talking compute shaders.

Share this post


Link to post
Share on other sites
Advertisement
Part 1: Theoretically, yes, they can run concurrently. The D3D/GL specs are intentionally not too specific about how this works under the hood, just that the end result needs to be deterministic. There *may* be specific hardware implementations that kill or rearrange operations if this happens in practice, but there are no blanket 'this is how it must work' restrictions I am aware of. When in doubt, use atomics.

Part 2: Good question, I think you may be okay. It's entirely possible to do depth-only rendering by not binding any RTVs (just a DSV) and I would think the same would apply here.

Share this post


Link to post
Share on other sites
This is implementation specific (and I do know of cases where this is and is not true), but yes, on most high end graphics hardware this will happen. Post-shader, there is the requirement that fragments must appear to be rendered as though they were in triangle order.

Share this post


Link to post
Share on other sites
Thank you very much guys. I will then have to assume (despite your last sentence, Crowley99) it will happen and I will use atomics. I'll try to update it here with performance and results, all if I get to try the 0 RTV + 0 DSV + n UAV scenario.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement