Does gpu guaranties order of execution?

Graphics and GPU Programming Programming

Started by Volgogradetzzz February 09, 2015 12:31 PM

6 comments, last by Wyrframe 9 years, 2 months ago

Volgogradetzzz

1,101

Author

February 09, 2015 12:31 PM

Greetings.

In my application I render indexed geometry. For simplicity, let's talk about two triangles A and B. In my index buffer I set first three vertices of triangle A and next tree vertices of triangle B. I turned off depth testing and in my application I always have B above A.

As far as I know - gpu is extremely parallel in execution. And I'm imagine that vertex shader is run for all six vertices at the same time. Next come pixel shader. I also think that pixels processed in parallel, i.e. same pixel can be processed at the same time for A and B. Since I turned off depth/stencil test, pixel shader executed twice for single pixel on the screen. But why B always "wins"? If both pixels processed at the same time there can be situations that pixel shader for A took longer to execute (but I never saw this). Is there a rule? Is there a guarantee?

Simply about complex

alh420

5,995

February 09, 2015 12:46 PM

The GPU will make sure the pixel is drawn _as_if_ each triangle in the index buffer is drawn in the order they are specified.

How it actually happens is up to the hardware, but the end result is the same.

Volgogradetzzz

1,101

Author

February 09, 2015 01:05 PM

Thank you. Does this applies to all pipeline stages? If I'm using UAV, can I be sure that I write to it in the same order as triangles specified?

Simply about complex

samoth

9,833

February 09, 2015 02:44 PM

Yes, although there is a "but...".

The GPU generally makes no promises and gives no guarantees whatsoever, and indeed works much differently from what one would "intuitively" expect.

However, the graphics or compute API that you use (such as e.g. OpenGL, CUDA, Direct3D) will usually give one or the other guarantee, and most of the time it does not matter in which order operations happen anyway.

Unless of course, when it matters... that's when you need to use things like barrier (CL) or memoryBarrier (GLSL) or glFenceSync on a higher level, or functions like glTextureBarrier.

Now, when does it matter in which order things are processed?

As a rule of thumb, it usually doesn't matter as long as you stick with the more "traditional" render pipeline:

It doesn't matter whether you process vertex 5, 34, or 732 first, they are not dependent on each other. You wouldn't know a difference and you don't care.
It matters that all vertices of a primitive (such as a triangle) have been processed before the geometry shader is invoked. The implementation ensures this is the case, simply by processing all the vertices, and then invoking the geometry shader (you need not care).
It matters that all vertex/geo/tesselation stuff (... belonging to one draw call) is done before the fragment shader is run. Again, this is trivially assured by how the pipeline works.

Triangles are rasterized to fragments with some unknown (unknown to you) method and are then processed in parallel in groups of 2x2 or larger (this is necessary for partial derivatives / mip calculation). Some fragments may be shaded although they are not part of a triangle at all, they will be discarded but are still shaded. Some may not pass a test (depth, stencil, whatever) and be discarded. Some fragments may be shaded twice (think fragments on the diagonal of a fullscreen quad, which is really just two triangles from the point of view of the hardware). Some will be weighted using some known or unknown or tuneable function (think multisampling).

Usually, rather than just 2x2, something like 64 or so fragments will be processed in parallel in a shader core running the same identical instructions at the same time (with several thousand queued, swapped in and out on demand to cover for texture/memory latency), and a few dozen or hundred execution units will run independently of each other.

Whatever! Not your problem! It is guaranteed (by the API contract, so it's finally the driver's problem) that what comes out is the same as-if everything happened exactly in the order that you specified. This is still relatively easy for the implementation to guarantee, since while you are allowed to read pretty much everything, you can only ever write to a single exactly specified location (in other words, you have gather functionality, but not scatter). So all the implementaion really needs to be doing is not mess up its own order of rasterization and blending.

So far for the easy part. Now there are atomic counters and shader load/store, which allow you to do... scatter -- write to more or less arbitrary locations, concurrently. This is where it gets ugly.

If you use shader load/store, you must take extra care. Writing to haphazard variables or memory locations not knowing which one of your fragments will be shaded first can, and will, lead to surprising results. It doesn't make a difference whether fragment 43772 is shaded before fragment 43775 if each one can only ever write to its own output, which is under the control of the driver. But it matters a lot when they both write a value to memory location 123456 or if they both modify a counter, and this happens in a different order than you had expected.

Volgogradetzzz

1,101

Author

February 09, 2015 02:59 PM

Thank you. I have a headache now .

Simply about complex

MJP

20,295

February 10, 2015 07:45 AM

Let me try to give you a TL:DR version from a DX11 point of view:

When you draw triangles, the value output from the pixel shader (using SV_Target) will be written to your render target in the order of triangle submission.
The actual pixel shader threads themselves will generally not execute in triangle order. So if you use a UAV to make arbitrary memory writes, those writes probably won't be ordered. Same goes for compute shader threads within a single dispatch.
If you split things into separate Draw or Dispatch calls, the driver will be forced to sync and flush such that the writes from dispatch A get written to memory before dispatch B. This allows dispatch B to use the results from dispatch A.

In case you're curious, the way that most desktop GPU's enforce render target write ordering is by having special hardware in the ROP's, which handle memory read/write operations for render targets.

The Blog | The Book

Volgogradetzzz

1,101

Author

February 10, 2015 08:39 AM

Thank you.

Simply about complex

Bummel

1,889

February 10, 2015 10:45 AM

Stream Out from geometry shader is ordered as far as I remember.

Wyrframe

2,489

February 10, 2015 04:28 PM

Relevant:

http://renderingpipeline.com/2012/03/gpu-rasterizer-pattern/

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

Does gpu guaranties order of execution?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Does gpu guaranties order of execution?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines