Sorting draw calls when stages are executed multiple times

Started by
4 comments, last by L. Spiro 11 years ago

Many game engines use a draw call sorting approach to create render queues (like this, Bitsquid also uses a similar system).

This works well when each stage (gbuffer pass, lighting) is only executed one time since you just have to set some bits in the sort key that identify the pass.

The problem appears when you have passes that might be executed multiple times like shadow mapping (and real time reflection maps) because it needs to be executed for each light on screen that casts shadows. There is a limited number of bits that identity each stage so you (probably) cannot simply assign each light a unique stage number.

I currently don't include stage bits in the sort key instead I have multiple "buckets" each bucket associated with a stage and render items are put in the buckets. Then the stages are executed in the defined order by rendering the render items inside each bucket, this way I can render the shadow map bucket multiple times.

In the end of the article I linked in the beginning of the topic the author wrote that this multiple buckets approach where draw calls are sorted inside the buckets was used in the PS2-era and nowadays draw calls from multiple stages are sorted together.

What am I missing?

Advertisement

I also use the multiple buckets approach. Sorting 5 lists is faster than sorting a single list that's 5 times bigger.
AFAIK, you can't use a single list (with a single stage identifier) for repeated stages like shadow mapping, because each light/shadow-stage will have different culling results. You also probably don't want to share items between the shadow list and the g-buffer/forward list, because objects that are 3-draw calls in the latter (3 materials) can be just one draw call in the former (z-only pass disregards material).

In that article, he's talking about what they happened to do on their PS2 games, not necessarily saying that one approach is PS2 era and one is PS3 era. Also, IIRC, GoW didn't use deferred shading, and made little use of shadow mapping - they merge many lights together into a single light for forward shading, which seems incompatible with having shadow maps per light.

I figured I'd jump in on this thread as I have a similar question.

I'm working on a data-driven renderer (similar to Horde) where you can define a rendering pipeline via a script.

Given:

a) a set of visible objects determined from the culling phase

and

b) a pipeline can have multiple stages each possibly requiring the objects sorted differently (back-to-front, front-to-back, material/state changes, etc)

What is the best way to perform sorting?

To me it'd seem that each stage with a sort option specified would have to loop through each object/drawcall/renderitem and generate a new sort key before sorting. That or each object would need to be filled with all necessary information so that custom sort functions could be written and choose what info to sort on instead of just doing the simple default integer sort using a key value.

Maybe I'm going about this completely wrong. Some insight would be great!

Firstly I define a render single queue object (CRenderQueue) which does an index sort on the objects inside it (not sorting the render-queue items but the indices to each item in the queue). These are maintained across frames to take advantages of temporal coherence (which is further exploited by using an insertion sort).

2 of these combine to make a CRenderQueueSet object in which one render queue is for opaque objects and the other for alpha-enabled objects.

In a basic scene with shadow mapping there are 3 render-pass types: Ambient/non-shadow-casting lights, shadow-map generation, and applying shadowed lights to objects (one extra pass per shadow).

The basic rule of thumb is that sorting needs to be done once per cull phase.

For the ambient and shadow-application phases the same objects will be drawn, so culling only happens once between the two stages, and thus sorting only happens once.

Then for each shadow you apply to objects, different objects will be culled on a per-light basis, so sorting also happens once for each light.

After culling (for any phase), the remaining objects are passed a CRenderQueueSet and so they can themselves to either the opaque render queue or the alpha-enabled render queue.

When submitting data to a render queue the objects need to know how to create a render-queue item, which requires that they know what type of render is being made (normal render or shadow-map-creation render, etc.)

This information is passed when the model is asked to submit itself to the render queue and it uses it in a simple switch case which selects the appropriate shader for that type of render and supplies that, along with other necessary information, to the render-queue item for the render queue(s).

A keen observer will then note that if they are only added to the render queue and sorted once, their shaders will change between the ambient pass and the shadow-map-application pass(es). Since shaders are very important for sorting, shouldn’t they be resorted?

This is true, but they will all change in the same way. That is, all shaders that had hemisphere lighting enabled and neither lights nor shadow-mapping enabled for the ambient pass will be permutated into the same shader when all of these are disabled or enabled for the later shadow-map-application pass(es). In other words, they all changed their shaders, but have remained implicitly sorted anyway.

In all cases, each render-queue remains across frames. Its clear function simply sets m_ui32Total to 0UL. The sorted indices stay sorted.

After adding back all the objects for the next frame, the objects will likely still be rendered in the same order as last frame and the insertion sort will quickly verify that.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

How do you generate the indices that point to each render item? I'm assuming you would need to have some way to uniquely identify render items?

If we look at a Light Pre-Pass rendering pipeline, the first stage would require the queue to be sorted by depth and the material stage sorted by material/shader/state BUT both stages utilize the same cull results (unless my understanding is incorrect). How is this handled? We have two different sort orders that use the same set of culled objects.

How do you generate the indices that point to each render item? I'm assuming you would need to have some way to uniquely identify render items?

The objects just append each render-queue item to the end of the list, so the unsorted array of indices is 0, 1, 2, 3, 4, …, N.
Of course you have to keep an array of indices as well.
After sorting 5 objects, the objects themselves have not moved, but the secondary array of indices is now 4, 2, 5, 0, 1.

The only trick about it is how to maintain temporal coherence.
The second copy of indices are not modified when the objects are removed from the list and re-added for the next frame.
Rule #1: If the total number of items in the list is the same as before, nothing about the secondary index list changes. Just do another insertion sort and it will generally already be sorted since objects will always add themselves to the render queue in the same order.
Rule #2: If the list has more items than before, just add those indices to the end of the resized index array. If we added 2 objects to our example, the new array would be:
4, 2, 5, 0, 1, 6, 7. Then perform the insertion sort.
Rule #3: If the list shrinks, reset the indices from 0 to N. So if the next frame had one fewer items, we would have:
0, 1, 2, 3, 4, 5. Then perform the insertion sort.

If we look at a Light Pre-Pass rendering pipeline, the first stage would require the queue to be sorted by depth and the material stage sorted by material/shader/state BUT both stages utilize the same cull results (unless my understanding is incorrect). How is this handled? We have two different sort orders that use the same set of culled objects.

Sorting once per culling is just a general rule of thumb. For Z-prepass there are the same list of objects but it really matters how they are sorted, so a second render queue, sorted only by depth, would be necessary. But not a second list of objects, of course.

If you are trying to generalize this you can make a series of nodes to represent stages of the rendering process and a list of named render queues for each node that later nodes can access. Just as before, where I use an enumeration for render types, the nodes would create render queues associated with these same render types so that objects know what to put into the render-queue items for proper sorting.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement