Temporal coherence and render queue sorting

Graphics and GPU Programming Programming OpenGL

Started by Xycaleth September 18, 2014 09:47 PM

18 comments, last by johnchapman 9 years, 6 months ago

Matias Goldberg

9,637

September 23, 2014 09:11 PM

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

Best Case: O(N)
Avg. Case: O( N log( N ) )
Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

Update: Stupid algebra mistake. See lunkhound's post. Avg case is better when dividing and conquering.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

lunkhound

1,336

September 26, 2014 05:24 PM

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

Best Case: O(N)

Avg. Case: O( N log( N ) )

Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

While I love your posts in general, this "correction" doesn't seem right to me. Specifically #2. With O( N log N ) run time -- worse than linear -- divide and conquer is beneficial when possible.

Plugging actual numbers into your inequality in 2 (N=1024, using base 2 log):

left side expansion: 2*(1024/2) log (1024/2) = 1024 * 9

right side expansion: 1024 log 1024 = 1024 * 10

The left side is LESS, contrary to your inequality.

Sorting 2 half-sized arrays is sometimes faster but never slower than a single full-size sort.

Perhaps you were thinking of searching with O( log N ) run time -- better than linear. In that case, doing divide and conquer IS harmful.

Matias Goldberg

9,637

September 27, 2014 04:24 PM

Ooops. My bad. You're right. I've updated my post.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Washu

7,836

September 29, 2014 07:24 PM

Cull objects through the spatial partitioning system (or even just brute-force if you are just testing things) using the camera’s frustum. You can populate the render-queue directly as you do this.

Or if you care about cache coherency, branch misses, ... the list goes on. Spacial partitioning is not useful except as a means of dividing up large (i.e. > 20k) objects into smaller lists which will still number objects in the thousands (or more). Once again, a link for those considering spacial partitioning.... here's why you probably shouldn't unless you have a LOT MORE than 20k objects...

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

tonemgub

2,009

September 30, 2014 07:54 AM

I especially like the part where it says:

The new culling

> Uses the computing power

What if I don't want my computing power to be (ab)used? :)

Hodgman

52,717

September 30, 2014 08:50 AM

I especially like the part where it says:
The new culling
> Uses the computing power
What if I don't want my computing power to be used? :)

Sleep(1000);
for(int i=0; i!=1000; ++i) printf("", sqrt(i*42.0));

. 22 Racing Series .

johnchapman

601

September 30, 2014 09:38 AM

This is interesting, although I'm a little confused about how the scheme outlined by L Spiro works, specifically:

If the new total is the same as the last total, do nothing. Leave indices as they are.

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different? In this case the indices won't necessarily point to the same items as in the previous frame, and the queue would need to be re-sorted.

www.john-chapman.net

tonemgub

2,009

September 30, 2014 01:18 PM

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different?

I thought of that too, but I think the algorithm is supposed to treat that in two separate steps (notice tehre's no "else" branch after the first "if"): first, the culled-out items are removed, which should register as "fewer items in the list" and causes a reset, then the culled-in items are added at the end of the list, and all items will get re-sorted next frame (or the other way around).

L. Spiro

25,818

September 30, 2014 07:53 PM

If the same number of items end up in the list but they are different items, the logic above applies.

#1: Do not mess with the indices. Leave as they are.

#2: The sorting pass will reorder them according to the new data.

Since the sort is performed every frame it automatically fixes any cases where objects are not in sorted order after filling the render-queue. They will be sorted before rendering, and then that order will be there for the next frame.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

johnchapman

601

October 01, 2014 09:21 AM

#2: The sorting pass will reorder them according to the new data.

Ah, thanks for clearing that up. I was thinking that you were skipping the sort in this case.

www.john-chapman.net

Temporal coherence and render queue sorting

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Temporal coherence and render queue sorting

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines