Sign in to follow this  

Sorting Objects

This topic is 2332 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

It seems like it is only practical to sort objects based on a couple properties. For example, say I am given a list of visible objects L.

I can choose how to sort or divide them. Say I divide them into three sub lists:

Sublist S1 = Opaque
Sublist S2 = AlphaTested
Sublist S3 = Transparent

I can then sort the first two in front-to back order, and the last in back-to-front order.

However, when I go to draw S1, there are additional properties that could be sorted (shader programs--e.g., skinned or not, normal mapped or not--textures).

Is this just a fact of life, and you just choose to sort based on the property that gives the best bang for your buck, or am I missing something?

Share this post


Link to post
Share on other sites
Usually you would treat opaque and transparent objects differently so it makes sense to have them in separate lists. As for sorting, you could use a sort key/hash which would allow you to sort by a number of properties (distance, shader/material, etc).

Share this post


Link to post
Share on other sites
You can find more information about the key/hash approach here: [url="http://realtimecollisiondetection.net/blog/?p=86"]Order your graphics draw calls around![/url].

Share this post


Link to post
Share on other sites
[quote name='Quat' timestamp='1311196461' post='4838139']
It seems like it is only practical to sort objects based on a couple properties. For example, say I am given a list of visible objects L.

I can choose how to sort or divide them. Say I divide them into three sub lists:

Sublist S1 = Opaque
Sublist S2 = AlphaTested
Sublist S3 = Transparent

I can then sort the first two in front-to back order, and the last in back-to-front order.

However, when I go to draw S1, there are additional properties that could be sorted (shader programs--e.g., skinned or not, normal mapped or not--textures).

Is this just a fact of life, and you just choose to sort based on the property that gives the best bang for your buck, or am I missing something?
[/quote]

It is only a fact based on how detailed you want to get. Now, sorting S1 front to back is pretty much a waste of time since it is opaque and you probably have a z-buffer (why wouldn't you anymore?), there is no sorting required. You "may" simply sort on shader/texture for this stage and if shader costs are high, do a z-prepass so shaders are only computed for visible pixels.

Alpha tested and translucent items get considerably more entertaining. I don't know that folks go to the levels I'm about to detail anymore but given that I did this in software rasterization at one point, sorting was MUCH more important. You can get back "some" of your shader/texture sorting abilities by doing an overlap sort in 3 dimensions. It's something like a culling exercise done in world and screen space.

Basically what you are doing is similar to the broad phase in collision detection. You want to sort the triangles into "islands" where only a subset of triangles/objects are overlapping and require any form of advanced work. Basically take the bounds of all the submeshes (in my system this means "per material" and in this case means with sorting enabled, implied translucent or alpha tested) and build groups of objects which overlap each other. This tells you which submeshes have triangles which can actually interpenetrate and need to be sorted and or even subdivided if you want absolute perfection. (Down side, a huge explosion particle system makes this VERY expensive, when don't huge things have problems though. :))

Now sort the bounds projected to screen space on the x. You setup lists of objects which overlap on the x plane. Repeat for the y plane in screen space. What you have at this point is an interaction graph, well the start of one, which tells you which pieces can be rendered safely with or without merged material soups. Any object which is not in any of the overlap groups simply renders with it's own polys sorted against each other since it can have no interaction with anything else. Anything in the x/y groups but not in the world overlap groups can sort their tri's into a single draw call much like the prior but they have to be object level depth sorted (i.e. you don't need to merge the tri's of the various objects) with other objects and rendered back to front, but this needs to include the islands of world space overlapping objects as part of the order.

Basically after doing all the object bounds sorting, you end up with:
1. Objects which can be drawn completely independently and you can resort those for shader/texture etc freely.
2. Objects which overlap only in terms of painters algorithm which means you just sort front to back and do whatever is required in #1 to each object. I.e. sort the individual triangles for non-convex shapes or render back face/front face to avoid sorting etc.
3. The nightmare of poly soup is the remaining set of objects which physically overlap in world space. This is the bit you mention and no, can't think of a way to reduce draw calls if you want perfect sorting. Depth peeling and related items may be the only proper solution for these or cpu side tri-splitting etc.

Without diagrams this is difficult to describe and I'm sure my brief overview was probably confusing. To get a better explanation google up broad phase collision detection specifically involving sort and sweep variations. That covers how to maintain a quick "island" system. The overlaps in 2D space should all be pretty obvious, basically if the object doesn't overlap in both x and y, then it is independent and has no need of being sorted against other things.

Share this post


Link to post
Share on other sites
[quote name='AllEightUp' timestamp='1311208451' post='4838201']
[quote name='Quat' timestamp='1311196461' post='4838139']
It seems like it is only practical to sort objects based on a couple properties. For example, say I am given a list of visible objects L.

I can choose how to sort or divide them. Say I divide them into three sub lists:

Sublist S1 = Opaque
Sublist S2 = AlphaTested
Sublist S3 = Transparent

I can then sort the first two in front-to back order, and the last in back-to-front order.

However, when I go to draw S1, there are additional properties that could be sorted (shader programs--e.g., skinned or not, normal mapped or not--textures).

Is this just a fact of life, and you just choose to sort based on the property that gives the best bang for your buck, or am I missing something?
[/quote]
Basically after doing all the object bounds sorting, you end up with:
1. Objects which can be drawn completely independently and you can resort those for shader/texture etc freely.
2. Objects which overlap only in terms of painters algorithm which means you just sort front to back and do whatever is required in #1 to each object. I.e. sort the individual triangles for non-convex shapes or render back face/front face to avoid sorting etc.
3. The nightmare of poly soup is the remaining set of objects which physically overlap in world space. This is the bit you mention and no, can't think of a way to reduce draw calls if you want perfect sorting. Depth peeling and related items may be the only proper solution for these or cpu side tri-splitting etc.
[/quote]

I should mention that this is easily optimized and only sounds complicated when you don't really understand it. Start with the 2D case first and then add the 3D islands, it's not nearly as difficult as it sounds. (And rather CPU friendly with good GPU performance results.)

Share this post


Link to post
Share on other sites
[quote name='AllEightUp' timestamp='1311208451' post='4838201']
Now, sorting S1 front to back is pretty much a waste of time since it is opaque and you probably have a z-buffer (why wouldn't you anymore?), there is no sorting required.[/quote]Front-to-back sorting of opaques is designed to take advantage of the hierarchical z-buffer, which can perform a coarse Z-test prior to executing the pixel shader. If you've got expensive pixel shaders, then this might help reudce your shading costs.

However, a common alternative is the z-pre-pass, where you render all your opaques first, only writing to the depth buffer (which is cheap), and then render them all again, this time actually using the real pixel shaders. This gives you the benefits of the hierarchical z-buffer, without having to use a front-to-back ordering.

Also, in a deferred pipeline, your objects themselves don't typically have expensive pixel shaders (the lights do), so front-to-back sorting probably won't help you that much.

So, with opaques, sorting by depth is designed to reduce GPU-side pixel shader costs, whereas sorting by state ([i]e.g. by material[/i]) is designed to reduce CPU-side submission costs.
As with any optimisation, the best choice can only be made after measuring the characteristics of your particular situation.

Share this post


Link to post
Share on other sites
[quote name='AllEightUp' timestamp='1311208451' post='4838201']Now, sorting S1 front to back is pretty much a waste of time since it is opaque and you probably have a z-buffer (why wouldn't you anymore?), there is no sorting required.[/quote]

If you draw back to front then the z-buffer is useless. Everything will pass the z test if the geometry you are drawing is always in front of everything else. So sorting front to back is what actually allows you to take advantage of the z buffer.

Am I right?

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1311212101' post='4838230']
[quote name='AllEightUp' timestamp='1311208451' post='4838201']
Now, sorting S1 front to back is pretty much a waste of time since it is opaque and you probably have a z-buffer (why wouldn't you anymore?), there is no sorting required.[/quote]Front-to-back sorting of opaques is designed to take advantage of the hierarchical z-buffer, which can perform a coarse Z-test prior to executing the pixel shader. If you've got expensive pixel shaders, then this might help reudce your shading costs.

However, a common alternative is the z-pre-pass, where you render all your opaques first, only writing to the depth buffer (which is cheap), and then render them all again, this time actually using the real pixel shaders. This gives you the benefits of the hierarchical z-buffer, without having to use a front-to-back ordering.

Also, in a deferred pipeline, your objects themselves don't typically have expensive pixel shaders (the lights do), so front-to-back sorting probably won't help you that much.

So, with opaques, sorting by depth is designed to reduce GPU-side pixel shader costs, whereas sorting by state ([i]e.g. by material[/i]) is designed to reduce CPU-side submission costs.
As with any optimisation, the best choice can only be made after measuring the characteristics of your particular situation.
[/quote]

I tend to ignore the hierarchical z buffers lately as the z-prepass trumps it in both performance and added abilities. The cpu cost and reordering of the batches just doesn't seem to give any benefit on recent cards in my testing. I do leave it as an option but just never turn it on anymore. This may be a side effect of my pipeline so your milage may vary.

Share this post


Link to post
Share on other sites
[quote name='Olhovsky' timestamp='1311217445' post='4838265']
[quote name='AllEightUp' timestamp='1311208451' post='4838201']Now, sorting S1 front to back is pretty much a waste of time since it is opaque and you probably have a z-buffer (why wouldn't you anymore?), there is no sorting required.[/quote]

If you draw back to front then the z-buffer is useless. Everything will pass the z test if the geometry you are drawing is always in front of everything else. So sorting front to back is what actually allows you to take advantage of the z buffer.

Am I right?
[/quote]


Erm, my other left..... Yup, front to back ordering, not the other way around, would be the proper direction for opaque objects to get hierarchical z. Dislexia ftw.

Share this post


Link to post
Share on other sites
[quote name='AllEightUp' timestamp='1311255734' post='4838439']
[quote name='Hodgman' timestamp='1311212101' post='4838230']
Front-to-back sorting of opaques is designed to take advantage of the hierarchical z-buffer.
However, a common alternative [to front-to-back sorting] is the z-pre-pass, [which] gives you the benefits of the hierarchical z-buffer, without having to use a front-to-back ordering.[/quote]I tend to ignore the hierarchical z buffers lately as the z-prepass trumps it in both performance and added abilities.[/quote]Instead of sorting front-to-back to take advantage of the HZB, you can use a ZPP to take advantage of the HZB...
...but instead of either sorting or using a ZPP to exploit the HZB, you... use use a ZPP?

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1311322634' post='4838808']
[quote name='AllEightUp' timestamp='1311255734' post='4838439']
[quote name='Hodgman' timestamp='1311212101' post='4838230']
Front-to-back sorting of opaques is designed to take advantage of the hierarchical z-buffer.
However, a common alternative [to front-to-back sorting] is the z-pre-pass, [which] gives you the benefits of the hierarchical z-buffer, without having to use a front-to-back ordering.[/quote]I tend to ignore the hierarchical z buffers lately as the z-prepass trumps it in both performance and added abilities.[/quote]Instead of sorting front-to-back to take advantage of the HZB, you can use a ZPP to take advantage of the HZB...
...but instead of either sorting or using a ZPP to exploit the HZB, you... use use a ZPP?
[/quote]

I should have probably stated this as more in terms that given my scenes are semi-ordered due to the culling system, a better sort may reduce a bit of overdraw but the CPU cost doesn't ballance out compared to the GPU gain. Again, this is a by product of my pipeline which is optimized for a specific type of scene in a style of game which is not likely going to be a genericly applicable result. But, still, given how fast you can do the ZPP anymore, leaving it with even a lot of overdraw may still be a win given a more simplistic pipeline on the CPU side. (NOTE: I do notice that ZPP on NVidia cards is notably slower than AMD, probably due to having fewer processors, I may find the sort is worth it when I do better testing on my NVidia cards.)

Share this post


Link to post
Share on other sites
[quote name='AllEightUp' timestamp='1311338690' post='4838876']
I should have probably stated this as more in terms that given my scenes are semi-ordered due to the culling system, a better sort may reduce a bit of overdraw but the CPU cost doesn't ...[/quote]
In that case, one would do a single run of bubble sort per frame (yes, bubble sort, no joke). For "almost ordered", bubble sort works surprisingly good, and it's dead simple. A single pass per frame will eventually optimize your draw order over several frames at a fixed (basically zero) cost. There is no possible worst case like with any other sort, because you do exactly one run.

Share this post


Link to post
Share on other sites
[quote name='samoth' timestamp='1311343662' post='4838907']
[quote name='AllEightUp' timestamp='1311338690' post='4838876']
I should have probably stated this as more in terms that given my scenes are semi-ordered due to the culling system, a better sort may reduce a bit of overdraw but the CPU cost doesn't ...[/quote]
In that case, one would do a single run of bubble sort per frame (yes, bubble sort, no joke). For "almost ordered", bubble sort works surprisingly good, and it's dead simple. A single pass per frame will eventually optimize your draw order over several frames at a fixed (basically zero) cost. There is no possible worst case like with any other sort, because you do exactly one run.
[/quote]

That's actually what the culling system is doing, one pass of bubble sort per frame just to keep rough ordering. I do this due to the fact that the camera remains static for fairly long periods and as such, I can glob up lots of geometry for a quick cull test and only really have to deal with dynamic items for culling. It cuts down my frustum vs aabb/sphere tests by about 90% which while not a real performance problem it just free's up enough CPU to do some other things on low end boxes.

Share this post


Link to post
Share on other sites
[quote name='Shael' timestamp='1311206755' post='4838198']
Usually you would treat opaque and transparent objects differently so it makes sense to have them in separate lists. As for sorting, you could use a sort key/hash which would allow you to sort by a number of properties (distance, shader/material, etc).
[/quote]

I am having some difficulty regarding the key sorted approach mentioned. I am trying to tie up rendering commands (set shader params, set render targets, etc) + draw calls (set pass, draw primitive) into a command queue, and sort that queue. This apparently means I do not want the commands and draw calls to go out of order, but only sort the draw calls based on certain properties. I thought this would be a convenient approach as I would not need to maintain a separate list of sorted primitives for each scene-cull, however after much considering as to how to go about creating the key for such a sparse-sort order I am thinking if having separate buckets is a better approach (less elements to sort, no complex key needed).

For Eg: The queue may look like:
* Set Render Target
* Set View Port
* Set Pass P1
* Set Shader Param T1
* Draw O1
* Set Shader Param T2
* Draw O2
* Set Pass P2
* ...

The problem is to keep the grouped draw calls and set calls in order, which requires me to use a grouping number in the key, something like:
[ layer | command | material id | distance | group number ]

I will have to think up something better however as the key is becoming very redundant.

Share this post


Link to post
Share on other sites
[quote name='obhi' timestamp='1311525748' post='4839648']
This apparently means I do not want the commands and draw calls to go out of order, but only sort the draw calls based on certain properties. I thought this would be a convenient approach as I would not need to maintain a separate list of sorted primitives for each scene-cull, however after much considering as to how to go about creating the key for such a sparse-sort order I am thinking if having separate buckets is a better approach (less elements to sort, no complex key needed).

The problem is to keep the grouped draw calls and set calls in order, which requires me to use a grouping number in the key, something like:
[ layer | command | material id | distance | group number ]

I will have to think up something better however as the key is becoming very redundant.
[/quote]

Theres no difference between the grouping number that you just mentioned and the "complex key".

In an offline step, or even just when you construct a [material | draw command] batch, you can perform a generic hash of the contents of those members and that becomes your "material hash"

Have your distance calculated as an integer from either the nearplane or farplane based on whether you want front-to-back or back-to-front sorting for the element, and this is the "distance hash"

Bit pack the hashes together with the layer number, and you have your sorting key. Put the bits for the distance hash higher than the bits for the material hash to prioritise depth sorting to material sorting.

The buckets arent necissarily a bad idea either though, as they are effectively pre-sorting larger chunks of your rendering (unless you end up with your buckets being too many, and too small in size)

In my case i use a 32-bit hash which yeilds a blindingly quick radix sort
material sorted (opaques): [1bit layer = 0, 15bit MaterialHash, 16bit DepthHash]
depth sorted (everything else): [1bit layer = 1, 16bit DepthHash, 15bit MaterialHash]

Share this post


Link to post
Share on other sites
Regarding the key/hash approach here: [url="http://realtimecollisiondetection.net/blog/?p=86"]Order your graphics draw calls around![/url].

In particular the handling of materials. So let's just say you have 32 bits for your material ID. Then objects will be sorted by material and you can render many objects that share the same material without state changes. However a material may consists of several properties. For example:

1. Render state (for example, is the geometry two sided or not)
2. Shader programs
3. Input layout
4. Textures/constants

Say you had two materials out of N materials, but the two only differed by render state. It would be nice if the objects corresponding to these two materials were "next" to each other in the sorted list, as it would minimize state changes when going from one material to the next. But I don't think it is possible to do this without extra sorts or extra data (maintaining a large bitflag list with every material permutation to ensure they end up sorted like this).

Share this post


Link to post
Share on other sites
There's no reason why the sort key can't be made up using the addresses of a render state, shader, input layout, and texture. Quoting Hodgman from the [url="http://www.gamedev.net/topic/604899-frostbite-rendering-architecture-question/page__st__40"]Frostbite[/url] thread he shows a simple example:

[quote]

For sorting opaque objects, I create a hash from the address of the most expensive state-groups -- usually the material and shader ones. e.g. something like:[code]u32 shader = (u32)pShaderState;
u32 material = (u32)pMaterialState;
u32 hash = ((shader^(shader<<16))&0xFFFF0000) | ((material^(material>>16))&0xFFFF);
[/code]
[/quote]

Share this post


Link to post
Share on other sites

This topic is 2332 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this