Temporal coherence and render queue sorting

Started by
18 comments, last by johnchapman 9 years, 6 months ago

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

  • Best Case: O(N)
  • Avg. Case: O( N log( N ) )
  • Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

Update: Stupid algebra mistake. See lunkhound's post. Avg case is better when dividing and conquering.

Advertisement

I loved whole L. Spiro's post, but I have something to correct

Sorting 2 smaller queues is faster than 1 big one.

This is a half truth.
Sorting can take:

  • Best Case: O(N)
  • Avg. Case: O( N log( N ) )
  • Worst Case: O( N^2 )

1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.

2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)

3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.

In the end you'll have to profile as it is not a golden rule.

Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).

While I love your posts in general, this "correction" doesn't seem right to me. Specifically #2. With O( N log N ) run time -- worse than linear -- divide and conquer is beneficial when possible.

Plugging actual numbers into your inequality in 2 (N=1024, using base 2 log):

left side expansion: 2*(1024/2) log (1024/2) = 1024 * 9

right side expansion: 1024 log 1024 = 1024 * 10

The left side is LESS, contrary to your inequality.

Sorting 2 half-sized arrays is sometimes faster but never slower than a single full-size sort.

Perhaps you were thinking of searching with O( log N ) run time -- better than linear. In that case, doing divide and conquer IS harmful.

Ooops. My bad. You're right. I've updated my post.

Cull objects through the spatial partitioning system (or even just brute-force if you are just testing things) using the camera’s frustum. You can populate the render-queue directly as you do this.

Or if you care about cache coherency, branch misses, ... the list goes on. Spacial partitioning is not useful except as a means of dividing up large (i.e. > 20k) objects into smaller lists which will still number objects in the thousands (or more). Once again, a link for those considering spacial partitioning.... here's why you probably shouldn't unless you have a LOT MORE than 20k objects...

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

I especially like the part where it says:

The new culling

> Uses the computing power

What if I don't want my computing power to be (ab)used? :)

I especially like the part where it says:

The new culling
> Uses the computing power

What if I don't want my computing power to be used? :)
Sleep(1000);
for(int i=0; i!=1000; ++i) printf("", sqrt(i*42.0));

This is interesting, although I'm a little confused about how the scheme outlined by L Spiro works, specifically:


If the new total is the same as the last total, do nothing. Leave indices as they are.

Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different? In this case the indices won't necessarily point to the same items as in the previous frame, and the queue would need to be re-sorted.


Is it not possible that the total could be the same between two frames, but that items in the render queue (after culling) are different?

I thought of that too, but I think the algorithm is supposed to treat that in two separate steps (notice tehre's no "else" branch after the first "if"): first, the culled-out items are removed, which should register as "fewer items in the list" and causes a reset, then the culled-in items are added at the end of the list, and all items will get re-sorted next frame (or the other way around).

If the same number of items end up in the list but they are different items, the logic above applies.

#1: Do not mess with the indices. Leave as they are.

#2: The sorting pass will reorder them according to the new data.

Since the sort is performed every frame it automatically fixes any cases where objects are not in sorted order after filling the render-queue. They will be sorted before rendering, and then that order will be there for the next frame.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


#2: The sorting pass will reorder them according to the new data.

Ah, thanks for clearing that up. I was thinking that you were skipping the sort in this case.

This topic is closed to new replies.

Advertisement