Jump to content

  • Log In with Google      Sign In   
  • Create Account

Handling depth sorting key and hardware instancing


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
22 replies to this topic

#1 lipsryme   Members   -  Reputation: 1025

Like
1Likes
Like

Posted 27 March 2013 - 09:31 AM

Is it even possible ?

 

I'm creating draw calls as a pair of <sort key, ptr to draw call data> which I then sort after the key that stores my depth (32bit unsigned integer). Now as I would create a draw call for let's say an instance of 10000 objects how would I be able to sort these 10000 objects after their depth when it's just a single draw call that I'm creating.


Edited by lipsryme, 27 March 2013 - 09:31 AM.


Sponsor:

#2 Krohm   Crossbones+   -  Reputation: 3129

Like
-1Likes
Like

Posted 27 March 2013 - 10:06 AM

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.

 

edit: wrong!


Edited by Krohm, 29 March 2013 - 03:39 AM.


#3 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 27 March 2013 - 11:14 AM

What are you trying to accomplish with this?  I have to agree with Krohm, there is no way to sort the order that your instances are going to be rendered, unless you are directly using an explicit sorting mechanism (i.e. depth sorting with a depth buffer!).  Depending on what you are trying to do, there might be some alternative suggestions that we could make...



#4 lipsryme   Members   -  Reputation: 1025

Like
0Likes
Like

Posted 27 March 2013 - 11:19 AM

Well I'm trying to sort my draw calls. But in that case it seems I will have to give up on sorting them. I just find it interesting that in the frostbite 2.0 paper they say that they just assume everything as instanced. So they also just give up on sorting opaques front to back?

#5 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 27 March 2013 - 11:23 AM

Don't they use a variant of deferred rendering?  If so, then it is very likely that they do a depth pre-pass which would make it not so important to have depth sorting.  I haven't read through the paper yet though - do you have a link to it?



#6 lipsryme   Members   -  Reputation: 1025

Like
0Likes
Like

Posted 27 March 2013 - 01:30 PM

I'm having a similar problem now with opaque / transparent objects. In a deferred renderer I'd render them seperately as a forward pass after the opaque + lighting, correct? But what happens if there's 200 of the same plane, but 1 of them is now being changed (by the user) to be transparent.

So before I had an instancegroup of 200 objects. The only solution would be to remove this object from the previous instance group and add it to a new one I guess ? But then I'd have to always check for every object if it has changed from opaque to transparent during runtime. This sounds very complicated...



#7 MJP   Moderators   -  Reputation: 11455

Like
5Likes
Like

Posted 27 March 2013 - 02:46 PM

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.


Edited by MJP, 27 March 2013 - 02:46 PM.


#8 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 27 March 2013 - 07:29 PM

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel?  There is also some gray area regarding generated primitives too (via tessellation or the geometry shader) as they can be generated in parallel instances of the shaders...

 

I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.



#9 JohnnyCode   Members   -  Reputation: 245

Like
1Likes
Like

Posted 27 March 2013 - 10:14 PM

I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.

It is guranteed at least at successive Pass::Begin Pass::End encapsuled instructions, but I have tested multiple DrawIndexedPrimitive in one pass (to save some overhead), and it was successive I can tell  (since I did stencil shadow 2 back front passes). But as you said, I am not sure of this, and it seems I will implement it in 2 passes, to avoid breaking my game on some GPUs or drivers. Do not know, I cannot test profile all GPUs and I just don't trust them, I wanna sleep well. Interesting topic.



#10 MJP   Moderators   -  Reputation: 11455

Like
4Likes
Like

Posted 28 March 2013 - 12:15 AM

 

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel?  There is also some gray area regarding generated primitives too (via tessellation or the geometry shader) as they can be generated in parallel instances of the shaders...

 

I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.


Definitely. You're never guaranteed about the order in which vertices/primitives/pixels are processed in the shader units, but the ROPS will guarantee that the final results written to the render target match the triangle submission order (which is often done by buffering and re-ordering pending writes from pixel shaders). This is even true for geometry shaders, which is a big part of what makes them so slow.


Edited by MJP, 28 March 2013 - 12:15 AM.


#11 Tournicoti   Prime Members   -  Reputation: 683

Like
0Likes
Like

Posted 28 March 2013 - 01:08 AM

Hello

 

(About opaque geometry)

 

Depending on your scene, sorting opaque geometry in order to avoid pixel overwrites not always results in a gain in performances. Even worse, sorting and then drawing can be slower that directly drawing, without sorting. And if you use a deferred rendering technique, lights, shadows, ... calculations would not be concerned by this, only the g-buffer generation would.

 

(I tried exactly what you describe, and realized afterwards that not sorting opaque geometry and let the z-buffer doing its job was a better choice in my case !)

 

FWIW smile.png


Edited by Tournicoti, 28 March 2013 - 01:07 PM.


#12 Hodgman   Moderators   -  Reputation: 30441

Like
0Likes
Like

Posted 28 March 2013 - 04:18 AM

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel

If this behaviour wasn't true, then whenever you rendered a convex shape with alpha-blending, you'd get random results depending on how the race conditions between overlapping surfaces panned out each frame. This doesn't happen though -- the triangles are blended in the order that they appear in your index buffer, without any race conditions or corruption in the overlapping areas.

#13 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 28 March 2013 - 04:40 AM

 

 

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel?  There is also some gray area regarding generated primitives too (via tessellation or the geometry shader) as they can be generated in parallel instances of the shaders...

 

I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.


Definitely. You're never guaranteed about the order in which vertices/primitives/pixels are processed in the shader units, but the ROPS will guarantee that the final results written to the render target match the triangle submission order (which is often done by buffering and re-ordering pending writes from pixel shaders). This is even true for geometry shaders, which is a big part of what makes them so slow.

That is really good to know, and I was completely unaware of it - thanks for letting me know :)  That seems like it would be a big pain in the a$$ for the GPU manufacturer/driver to implement.

 

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel

If this behaviour wasn't true, then whenever you rendered a convex shape with alpha-blending, you'd get random results depending on how the race conditions between overlapping surfaces panned out each frame. This doesn't happen though -- the triangles are blended in the order that they appear in your index buffer, without any race conditions or corruption in the overlapping areas.

But the convex shape itself is not sorted - the depth sort order is based on the view direction, which is not the same every frame.  Since the buffer contents remains the same and the viewpoint changes every frame, then this can't be a valid argument about the ROP processing order since sometimes you would be in the reverse order.  Or am I thinking about that wrong?



#14 Hodgman   Moderators   -  Reputation: 30441

Like
0Likes
Like

Posted 28 March 2013 - 06:09 AM

But the convex shape itself is not sorted - the depth sort order is based on the view direction, which is not the same every frame.  Since the buffer contents remains the same and the viewpoint changes every frame, then this can't be a valid argument about the ROP processing order since sometimes you would be in the reverse order.  Or am I thinking about that wrong?

The view direction doesn't have to change each frame. Pick any one view direction and render the shape from that direction (and optionally vary any other irrelevant details that might influence race conditions, like processing load, other surrounding operations, phase of the moon, etc). You'll get the same result every time regardless of situational details that would influence race conditions. You always see the same triangles drawn over the top of other triangles, with no pixel/block artefacts from threading bugs within triangles. It will appear as if the GPU has rendered the triangles one at a time in the order specified.

 

In "foliage rendering in pure" they make use of this property in terrain grass rendering, by having a "tile" of 3D grass planes, with 8 index buffers, which each represent the planes as sorted correctly for 8 different viewing directions. When rendering, they pick the index buffer that has the most correct sorting order for the current view direction, which gives them almost correct back-to-front triangle sorting within each batch of grass.


Edited by Hodgman, 28 March 2013 - 06:12 AM.


#15 MJP   Moderators   -  Reputation: 11455

Like
2Likes
Like

Posted 28 March 2013 - 12:09 PM

For certain special cases of meshes you can actually pre-sort them to always render in the correct order regardless of viewing direction. For instance, say you have a transparent sphere that you wanted to be "double-sided" so that you can see both the front and the back of the sphere. A common way to do this is to duplicate all of the faces and flip the winding order so that they show up when back-facing. If you duplicate and append them to the end of the index buffer you get the wrong sorting order, since the front will render before the back. But if you append it to the beginning of the index buffer, the back faces will render first and the front faces will render second giving you the correct blending order.



#16 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 28 March 2013 - 01:34 PM

But the convex shape itself is not sorted - the depth sort order is based on the view direction, which is not the same every frame.  Since the buffer contents remains the same and the viewpoint changes every frame, then this can't be a valid argument about the ROP processing order since sometimes you would be in the reverse order.  Or am I thinking about that wrong?

The view direction doesn't have to change each frame. Pick any one view direction and render the shape from that direction (and optionally vary any other irrelevant details that might influence race conditions, like processing load, other surrounding operations, phase of the moon, etc). You'll get the same result every time regardless of situational details that would influence race conditions. You always see the same triangles drawn over the top of other triangles, with no pixel/block artefacts from threading bugs within triangles. It will appear as if the GPU has rendered the triangles one at a time in the order specified.

 

In "foliage rendering in pure" they make use of this property in terrain grass rendering, by having a "tile" of 3D grass planes, with 8 index buffers, which each represent the planes as sorted correctly for 8 different viewing directions. When rendering, they pick the index buffer that has the most correct sorting order for the current view direction, which gives them almost correct back-to-front triangle sorting within each batch of grass.

I'm not disputing that transparent convex shapes render properly regardless of view direction.  What I don't agree with is the statement that the order of the primitives has anything to do with it.  Unless you do something like what MJP mentioned about duplicating the geometry, then I don't think the order of the primitives makes any difference - front faces will be rendered, and back faces will be culled.  Even if the pixels were processed in a random fashion, it wouldn't matter to the output rendering.

 

For non-convex geometry, I think it is a different story - which is why I think the foliage method that you mentioned has different index buffers for different view directions.  This is the main assertion that I am trying to make - that convex geometry will render properly either way, but non-convex geometry does need to have properly sorted primitives.

 

 

For certain special cases of meshes you can actually pre-sort them to always render in the correct order regardless of viewing direction. For instance, say you have a transparent sphere that you wanted to be "double-sided" so that you can see both the front and the back of the sphere. A common way to do this is to duplicate all of the faces and flip the winding order so that they show up when back-facing. If you duplicate and append them to the end of the index buffer you get the wrong sorting order, since the front will render before the back. But if you append it to the beginning of the index buffer, the back faces will render first and the front faces will render second giving you the correct blending order.

That is a great way to indicate how this works - and it solidifies the concept in my mind.  Thanks for the example and the clarification!



#17 Hodgman   Moderators   -  Reputation: 30441

Like
0Likes
Like

Posted 28 March 2013 - 08:06 PM

Ahh, shoot. The whole time I meant 'concave', so it would have overlapping triangles....

#18 Krohm   Crossbones+   -  Reputation: 3129

Like
0Likes
Like

Posted 29 March 2013 - 03:34 AM

Thank you guys, for pointing this out! This simplifies a lot of things.

Where is this documented? I'm reading some D3D11 documentation but I'm afraid I'm missing the place where it is noted. 

Now that you write this, I think I recall a program which actually sorted all its particles.



#19 Jason Z   Crossbones+   -  Reputation: 5086

Like
0Likes
Like

Posted 29 March 2013 - 04:45 AM

Thank you guys, for pointing this out! This simplifies a lot of things.

Where is this documented? I'm reading some D3D11 documentation but I'm afraid I'm missing the place where it is noted. 

Now that you write this, I think I recall a program which actually sorted all its particles.

Transparent, non connected geometry that is at different depths will still need to be sorted - only the convex geometry (i.e. one contiguous object) won't have an issue, but separated particles would be a different story since the transparent pixels would likely be blended in different orders depending on the view direction.

 

Sorry to have misled you...   :(



#20 Krohm   Crossbones+   -  Reputation: 3129

Like
0Likes
Like

Posted 29 March 2013 - 09:18 AM

It would work anyway I guess! It's index order only. 

In "foliage rendering in pure" they make use of this property in terrain grass rendering, by having a "tile" of 3D grass planes, with 8 index buffers, which each represent the planes as sorted correctly for 8 different viewing directions. When rendering, they pick the index buffer that has the most correct sorting order for the current view direction, which gives them almost correct back-to-front triangle sorting within each batch of grass.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS