Sign in to follow this  
Xycaleth

OpenGL Temporal coherence and render queue sorting

Recommended Posts

I've seen this mentioned a lot in the past (and again more recently by L. Spiro in this thread) where you take advantage of temporal coherence between frames so that you don't have to perform a full re-sort (I'm not sure what the best terminology for this is) on the queue. The idea being that between frames, objects will be more or less in the same order as such a small time has passed.

 

But how is this achieved? If you use your already sorted list, how do you know which (renderable) objects in the list should be removed (e.g. because they were culled)?

 

My first idea would be something like this:

  1. Iterate through sorted list, check if culled. If not, keep it. If it is, remove it.
  2. Traverse through spatial tree (or however your scene is partitioned) and add objects which need rendering - have to check list for duplicates though!
  3. Then we can sort the queue.

But step 2 doesn't seem very efficient if you have to iterate the list a load of times to check if the object already exists.

 

As I write this, I realise you could add two extra fields to each of the objects to say in which frame it was last rendered, and when in which frame it was last removed. Then checking for existence in the list can be a simple lastRenderedFrame == currentFrame - 1 && lastRemovedFrame < currentFrame. You'd also need to update the lastRenderFrame field when you add it to the queue (step 2), and update the lastRemovedFrame when you removed from the queue (step 1).

 

Am I on the right sort of lines?

Share this post


Link to post
Share on other sites

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

 

Also, why do you have to reset the sort order to 0, 1, 2, 3, ... N when the new render queue is shorter? Wouldn't it be faster to just remove the no-longer present sub-meshes and leave the existing sub-meshes sorted?

Edited by tonemgub

Share this post


Link to post
Share on other sites

Wouldn't it be faster to just remove the no-longer present sub-meshes and leave the existing sub-meshes sorted?

Try to imagine what steps need to be taken to remove existing indices or to build a new array without these indices. Now compare that to creating an array from 1, 2, ... N.

Edited by Mussi

Share this post


Link to post
Share on other sites

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

You don’t with this algorithm.
In practice that level of accuracy is never needed.  It’s more important to draw triangles in the same order rather than in a specific order.  As long as they don’t jitter the player won’t notice that the blending is slightly wrong just because they were drawn in the wrong order.
 

Also, why do you have to reset the sort order

See above (Mussi’s reply).
If you have 1,000 objects and then the next frame you have 1 object, the list is still sorted as long as the list is just index 0. Now, think about how to detect that. A generalized algorithm will only help in a few situations if you just happen to get lucky enough that all the removed indices are ones you didn’t need, and in every other case time spent trying to detect that is just time wasted.
Restarting the index list is a bit of a cost but it’s sorted again after only 1 frame, so it’s the best general case.
 
 
L. Spiro Edited by L. Spiro

Share this post


Link to post
Share on other sites

Try to imagine what steps need to be taken to remove existing indices or to build a new array without these indices. Now compare that to creating an array from 1, 2, ... N.

You can do both in a single iteration over the array. To remove an index, you find it's position in the array by comparing each array value with the index, then in the same iteration, move all subsequent elements back by 1 for each removed element. So you just do array[i] = array[i + number_of_removed_indexes_found_so_far] instead of array[i] = i (number_of_removed_indexes_found_so_far will initially be 0).

Assuming that that the number of indexes removed each frame is small in comparison with the total number of indexes, this might be faster than re-ordering all indexes and breaking the coherence?

 

Even if you don't want to keep the remaining indexes sorted, array[i] = i seems like a waste of CPU cycles for some reason. Wouldn't it be faster to just keep something like a "mustSortAll" flag - set it to TRUE when indexes must be removed, and the next frame, if mustSortAll is TRUE, you use the i = 0...N indexes (the iterator values) directly in the sort, instead of setting and then reading back the vector values (which are going to be the same as the iterator values anyway)? smile.png

 

 

 


If you have 1,000 objects and then the next frame you have 1 object, the list is still sorted as long as the list is just index 0. Now, think about how to detect that. A generalized algorithm will only help in a few situations if you just happen to get lucky enough that all the removed indices are ones you didn’t need, and in every other case time spent trying to detect that is just time wasted.
Restarting the index list is a bit of a cost but it’s sorted again after only 1 frame, so it’s the best general case.

Good point. Then how about a middle-ground algorithm: If the number of removed items is relatively small, you just keep the remaining indexes sorted. Otherwise, you re-sort everything next frame (using my flag idea above).

Now all we need is the formula for the "relatively small" threshold - it could probably be determined by testing, though I think it's going to be a logarithmic value. (I'm not good with big O calculations).

Edited by tonemgub

Share this post


Link to post
Share on other sites

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

  • Best Case: O(N)
  • Avg. Case: O( N log( N ) )
  • Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

 

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

 

Update: Stupid algebra mistake. See lunkhound's post. Avg case is better when dividing and conquering.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

  • Best Case: O(N)
  • Avg. Case: O( N log( N ) )
  • Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

 

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

 

While I love your posts in general, this "correction" doesn't seem right to me.  Specifically #2.  With O( N log N ) run time -- worse than linear -- divide and conquer is beneficial when possible.

 

Plugging actual numbers into your inequality in 2 (N=1024, using base 2 log):

  left side expansion:  2*(1024/2) log (1024/2) = 1024 * 9

  right side expansion:  1024 log 1024 = 1024 * 10

The left side is LESS, contrary to your inequality.

 

Sorting 2 half-sized arrays is sometimes faster but never slower than a single full-size sort.

Perhaps you were thinking of searching with O( log N ) run time -- better than linear.  In that case, doing divide and conquer IS harmful.

Share this post


Link to post
Share on other sites

This is interesting, although I'm a little confused about how the scheme outlined by L Spiro works, specifically:


If the new total is the same as the last total, do nothing.  Leave indices as they are.

 

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different? In this case the indices won't necessarily point to the same items as in the previous frame, and the queue would need to be re-sorted.

Share this post


Link to post
Share on other sites

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different?

I thought of that too, but I think the algorithm is supposed to treat that in two separate steps (notice tehre's no "else" branch after the first "if"): first, the culled-out items are removed, which should register as "fewer items in the list" and causes a reset, then the culled-in items are added at the end of the list, and all items will get re-sorted next frame (or the other way around).

Edited by tonemgub

Share this post


Link to post
Share on other sites

If the same number of items end up in the list but they are different items, the logic above applies.

 

#1: Do not mess with the indices.  Leave as they are.

#2: The sorting pass will reorder them according to the new data.

 

Since the sort is performed every frame it automatically fixes any cases where objects are not in sorted order after filling the render-queue.  They will be sorted before rendering, and then that order will be there for the next frame.

 

 

L. Spiro

Share this post


Link to post
Share on other sites


#2: The sorting pass will reorder them according to the new data.

 

Ah, thanks for clearing that up. I was thinking that you were skipping the sort in this case.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this