# OpenGL Temporal coherence and render queue sorting

## Recommended Posts

I've seen this mentioned a lot in the past (and again more recently by L. Spiro in this thread) where you take advantage of temporal coherence between frames so that you don't have to perform a full re-sort (I'm not sure what the best terminology for this is) on the queue. The idea being that between frames, objects will be more or less in the same order as such a small time has passed.

But how is this achieved? If you use your already sorted list, how do you know which (renderable) objects in the list should be removed (e.g. because they were culled)?

My first idea would be something like this:

1. Iterate through sorted list, check if culled. If not, keep it. If it is, remove it.
2. Traverse through spatial tree (or however your scene is partitioned) and add objects which need rendering - have to check list for duplicates though!
3. Then we can sort the queue.

But step 2 doesn't seem very efficient if you have to iterate the list a load of times to check if the object already exists.

As I write this, I realise you could add two extra fields to each of the objects to say in which frame it was last rendered, and when in which frame it was last removed. Then checking for existence in the list can be a simple lastRenderedFrame == currentFrame - 1 && lastRemovedFrame < currentFrame. You'd also need to update the lastRenderFrame field when you add it to the queue (step 2), and update the lastRemovedFrame when you removed from the queue (step 1).

Am I on the right sort of lines?

##### Share on other sites

How do you deal with shadowcasters outside the cameras frustrum?

##### Share on other sites

Repeat the process from the light’s “camera” (view and frustum).  Store results in a render-queue assigned to that light.

L. Spiro

##### Share on other sites

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

Also, why do you have to reset the sort order to 0, 1, 2, 3, ... N when the new render queue is shorter? Wouldn't it be faster to just remove the no-longer present sub-meshes and leave the existing sub-meshes sorted?

Edited by tonemgub

##### Share on other sites

Thanks for the explanation L. Spiro! Makes more sense now.

##### Share on other sites

Wouldn't it be faster to just remove the no-longer present sub-meshes and leave the existing sub-meshes sorted?

Try to imagine what steps need to be taken to remove existing indices or to build a new array without these indices. Now compare that to creating an array from 1, 2, ... N.

Edited by Mussi

##### Share on other sites

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

You don’t with this algorithm.
In practice that level of accuracy is never needed.  It’s more important to draw triangles in the same order rather than in a specific order.  As long as they don’t jitter the player won’t notice that the blending is slightly wrong just because they were drawn in the wrong order.

Also, why do you have to reset the sort order

If you have 1,000 objects and then the next frame you have 1 object, the list is still sorted as long as the list is just index 0. Now, think about how to detect that. A generalized algorithm will only help in a few situations if you just happen to get lucky enough that all the removed indices are ones you didn’t need, and in every other case time spent trying to detect that is just time wasted.
Restarting the index list is a bit of a cost but it’s sorted again after only 1 frame, so it’s the best general case.

L. Spiro Edited by L. Spiro

##### Share on other sites

Try to imagine what steps need to be taken to remove existing indices or to build a new array without these indices. Now compare that to creating an array from 1, 2, ... N.

You can do both in a single iteration over the array. To remove an index, you find it's position in the array by comparing each array value with the index, then in the same iteration, move all subsequent elements back by 1 for each removed element. So you just do array[i] = array[i + number_of_removed_indexes_found_so_far] instead of array[i] = i (number_of_removed_indexes_found_so_far will initially be 0).

Assuming that that the number of indexes removed each frame is small in comparison with the total number of indexes, this might be faster than re-ordering all indexes and breaking the coherence?

Even if you don't want to keep the remaining indexes sorted, array[i] = i seems like a waste of CPU cycles for some reason. Wouldn't it be faster to just keep something like a "mustSortAll" flag - set it to TRUE when indexes must be removed, and the next frame, if mustSortAll is TRUE, you use the i = 0...N indexes (the iterator values) directly in the sort, instead of setting and then reading back the vector values (which are going to be the same as the iterator values anyway)?

If you have 1,000 objects and then the next frame you have 1 object, the list is still sorted as long as the list is just index 0. Now, think about how to detect that. A generalized algorithm will only help in a few situations if you just happen to get lucky enough that all the removed indices are ones you didn’t need, and in every other case time spent trying to detect that is just time wasted.
Restarting the index list is a bit of a cost but it’s sorted again after only 1 frame, so it’s the best general case.

Good point. Then how about a middle-ground algorithm: If the number of removed items is relatively small, you just keep the remaining indexes sorted. Otherwise, you re-sort everything next frame (using my flag idea above).

Now all we need is the formula for the "relatively small" threshold - it could probably be determined by testing, though I think it's going to be a logarithmic value. (I'm not good with big O calculations).

Edited by tonemgub

##### Share on other sites

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

• Best Case: O(N)
• Avg. Case: O( N log( N ) )
• Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

Update: Stupid algebra mistake. See lunkhound's post. Avg case is better when dividing and conquering.

Edited by Matias Goldberg

##### Share on other sites

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

• Best Case: O(N)
• Avg. Case: O( N log( N ) )
• Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

While I love your posts in general, this "correction" doesn't seem right to me.  Specifically #2.  With O( N log N ) run time -- worse than linear -- divide and conquer is beneficial when possible.

Plugging actual numbers into your inequality in 2 (N=1024, using base 2 log):

left side expansion:  2*(1024/2) log (1024/2) = 1024 * 9

right side expansion:  1024 log 1024 = 1024 * 10

The left side is LESS, contrary to your inequality.

Sorting 2 half-sized arrays is sometimes faster but never slower than a single full-size sort.

Perhaps you were thinking of searching with O( log N ) run time -- better than linear.  In that case, doing divide and conquer IS harmful.

##### Share on other sites

Ooops. My bad. You're right. I've updated my post.

##### Share on other sites

I especially like the part where it says:

The new culling

> Uses the computing power

What if I don't want my  computing power to be (ab)used? :)

##### Share on other sites

I especially like the part where it says:

The new culling
> Uses the computing power

What if I don't want my computing power to be used? :)
Sleep(1000);
for(int i=0; i!=1000; ++i) printf("", sqrt(i*42.0));

##### Share on other sites

This is interesting, although I'm a little confused about how the scheme outlined by L Spiro works, specifically:

If the new total is the same as the last total, do nothing.  Leave indices as they are.

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different? In this case the indices won't necessarily point to the same items as in the previous frame, and the queue would need to be re-sorted.

##### Share on other sites

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different?

I thought of that too, but I think the algorithm is supposed to treat that in two separate steps (notice tehre's no "else" branch after the first "if"): first, the culled-out items are removed, which should register as "fewer items in the list" and causes a reset, then the culled-in items are added at the end of the list, and all items will get re-sorted next frame (or the other way around).

Edited by tonemgub

##### Share on other sites

If the same number of items end up in the list but they are different items, the logic above applies.

#1: Do not mess with the indices.  Leave as they are.

#2: The sorting pass will reorder them according to the new data.

Since the sort is performed every frame it automatically fixes any cases where objects are not in sorted order after filling the render-queue.  They will be sorted before rendering, and then that order will be there for the next frame.

L. Spiro

##### Share on other sites

#2: The sorting pass will reorder them according to the new data.

Ah, thanks for clearing that up. I was thinking that you were skipping the sort in this case.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627704
• Total Posts
2978715
• ### Similar Content

• A friend of mine and I are making a 2D game engine as a learning experience and to hopefully build upon the experience in the long run.

-What I'm using:
C++;. Since im learning this language while in college and its one of the popular language to make games with why not.     Visual Studios; Im using a windows so yea.     SDL or GLFW; was thinking about SDL since i do some research on it where it is catching my interest but i hear SDL is a huge package compared to GLFW, so i may do GLFW to start with as learning since i may get overwhelmed with SDL.
-Questions
Knowing what we want in the engine what should our main focus be in terms of learning. File managements, with headers, functions ect. How can i properly manage files with out confusing myself and my friend when sharing code. Alternative to Visual studios: My friend has a mac and cant properly use Vis studios, is there another alternative to it?

• Both functions are available since 3.0, and I'm currently using glMapBuffer(), which works fine.
But, I was wondering if anyone has experienced advantage in using glMapBufferRange(), which allows to specify the range of the mapped buffer. Could this be only a safety measure or does it improve performance?
Note: I'm not asking about glBufferSubData()/glBufferData. Those two are irrelevant in this case.
• By xhcao
Before using void glBindImageTexture(    GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format), does need to make sure that texture is completeness.
• By cebugdev
hi guys,
are there any books, link online or any other resources that discusses on how to build special effects such as magic, lightning, etc. in OpenGL? i mean, yeah most of them are using particles but im looking for resources specifically on how to manipulate the particles to look like an effect that can be use for games,. i did fire particle before, and I want to learn how to do the other 'magic' as well.
Like are there one book or link(cant find in google) that atleast featured how to make different particle effects in OpenGL (or DirectX)? If there is no one stop shop for it, maybe ill just look for some tips on how to make a particle engine that is flexible enough to enable me to design different effects/magic
let me know if you guys have recommendations.
• By dud3
How do we rotate the camera around x axis 360 degrees, without having the strange effect as in my video below?
Mine behaves exactly the same way spherical coordinates would, I'm using euler angles.
Tried googling, but couldn't find a proper answer, guessing I don't know what exactly to google for, googled 'rotate 360 around x axis', got no proper answers.

References:
Code: https://pastebin.com/Hcshj3FQ
The video shows the difference between blender and my rotation:

• 21
• 14
• 12
• 10
• 12