Render List - What's the best way?

Started by
9 comments, last by Halsafar 18 years, 9 months ago
Hey all, I'm getting some great FPS on my sprite engine (3K sprites running at 150+fps) but I know I can improve this. Please don't accuse me of over-optimizing, I have the need to push more sprites onto the screen at a higher speed. My current render frame goes like this: Every sprite adds itself to the render list. Sort the render list and iterate through the list to render. The list is sorted by Z then Y then Texture. It's cleared every frame and rebuilt (though it's size remains the same, as to prevent 1000s of new calls). I feel that a list sorted last minute is going to be the slowest method compared to sorting on insert. However - I'm stumped on what structure to pick. I can't go with a binary tree because I need items that can exist on the same level. Take for example these items sorted by Z:

If I've got 5, 1 and 2:
   5
 1
   2

Then I try to insert another item at level 1:

   5
  1
 1
   2
Now the tree is messed up. What data structure do you guys use to build your render lists?
Advertisement
I have absolutely no idea of the wisdom, or speed, but I've always used an n-tree [each node may have arbitrary children] in the past. But that's generally because the projects have tended to be GUI heavy, and it was convinient to provide the default 'if this node dies, all of its children should die' behavior.

1. It is very likely that most of the sprites remain sorted after a frame. Consider resorting the list from the previous frame rather than building a new list.

2. I assume it is necessary, but it isn't clear why you sort by Z and Y. Sorting by just texture would probably result in a much smaller number of draws, which could have a significant effect on frame rate.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
I have a couple untested ideas, and since you didn't give a more detailed explanation of what kind of data your dealing with (like number of z-levels, number of textures, is one texture usually used on just one z-level etc.)

If your using OpenGL (This is probably true of directx too), You can use the z-buffer in orthomode, so you can have the video card handle z-sorting and only sort by texture id, The question is if the advantage of less texture binding out weighs the cost of have the card do depth checking (I assume that since your sorting by z that your not using depth checking).

You can sort by Z and Y at the same time. I'm assuming your using a comparison sort of some kind.

If you have a small amount of z-values, you could have a render list for each z-value, and sort each list separately (which should be faster than sorting all of them).

If your using STL, a std::set is stored in sorted order (and anything you insert into it is sorted). You just have to make sure that your compare (it defaults to operator<) is set to make z more important than textureid. Also you could just erase and insert things, that have changed (either added or removed, or had their z or textureid change), instead of rebuilding the entire list, which is probably faster. To avoid all the new and delete calls you can use the boost pool allocator with the set.

[edit]posted before finishing, now finished[/edit]
Multiple lists is an insignificant improvement. If you have one tree with 1,000 nodes you can get to the bottom in ~10 checks. If you split that tree into two with 500 nodes each, you get to the bottom in ~9 checks. Doesn't seem overly worth it, to me. If I must, I'll try using it.

I suppose I should try enabling the Z-buffer and removing the Z-sort, I'm not sure what sort of returns I would receive. Will this handle alpha-blending properly as well? I thought you had to render those back-to-front manually.

As for sorting by Y, I need that to render objects lower on the screen ontop of objects higher on the screen. Imagine your typical SNES RPG:

 o-|- o-|- -


The person on the bottom's head blocks the person on the top's feet. If you know of alternatives to manually sorting by Y in this situation, I'd love to hear it.

Thinking about it, I could keep two lists. Sort the background by texture and the foreground layers by Y then texture. The background is the most expensive of the layers (usually), so that should work out perfectly.

What sort of methods do you guys use to keep the same list of renderables? My renderables don't have any IDs attached to them, so I couldn't reasonably find them in a list to check to see where they are.
With alpha blending if your alpha is anything but 0.0 or 1.0 then they'll need to be sorted to be correct (I think depth peeling allows you to draw in any order, but I think it's slow).

And for the multiply lists I was talking about doing a basic linear sort on the list (no tree), so with 1000 objects and 10 depth levels (assuming even distribution) you go from 1000*log2(1000)=9,965 comparisons to 10*(100*log2(100))=6643 comparisons.

I think the std::set thing would be the fastest, (since you can save sorting information from the previous frame, which probably hasn't changed much), I was basically using the std::set as a red-black tree (I think that's the tree type that most standard librarys use). You may want to look into red-black trees instead (there's probably other balanced binary trees that would work too). Using a red-black tree should move you from n*log(n) (for a simply linear sort), where n is the number of objects to c*log(n), where c is the number of objects that have been added,removed or changed (these ones count as 2 each) z-level or textureid. The worst case is 2n*log(n) (Changing every object), but most of the time much lower than n. Of course now the cost of keeping track of changes (could be tricky to implement too), and managing the red-black tree(not that bad), may cost more than you gain. I doubt it, but it depends on how efficiently you can keep track of changes (The only ways I can think off add n so you get c*log(n)+n), and how many objects on average change between frame. But you would go from 1000*log2(1000)=9,965 to (assuming 100 changes) 100*log2(1000)+1000=1996.

Do note that when using big-o notation that if n is small then all those constant numbers that get ignored can become much more important (which is why we profile our optimizations). So if you don't have many objects then all this red-black tree stuff could (and probably will) very well be slower than a "dumb" linear sort. But if your using lots of objects, with relatively few changes every frame, the red-black should be MUCH faster.

[edit]Forgot to add dictionary to spell check. And added some more[/edit]

[Edited by - Cocalus on June 23, 2005 2:37:10 AM]
I doubt using the z-buffer will help (might be worth checking though).

Another easy and probably good possibility (which is close to what your doing now) is to do a radix sort. So basically use textureids as the first radix and the second with z-levels (I assume they're discrete). Which is 2*n+tr+zr operations,where tr is textureid range and zr is the range of z-values,(~2000 to compare to my previous numbers). It also doesn't need per object comparisons, but it's performance could be hurt by a wide range of texureids and/or z-levels.

A wide range of textureids (I think textureids are normally given out consecutively,so it shouldn't be a problem) could be handled with some kind of mapping of textureids to a smaller range.
Quote:Original post by GroZZleR
The list is sorted by Z then Y then Texture. It's cleared every frame and rebuilt (though it's size remains the same, as to prevent 1000s of new calls).


Those texture changes sound like they would be a problem. If sorting by texture first isn't practical, what about putting multiple sprites onto one texture? If you double the dimensions of your texture and fit 4 sprites to each one, you might vastly improve performance due to having 25% of the texture changes compared to before.

Quote:I feel that a list sorted last minute is going to be the slowest method compared to sorting on insert.


Assuming N items, you can find an insert position in O(logN) time, and inserting it is O(N) for a vector, so that's O(N2logN). You can insert at O(1) for a list but finding the insert position is roughly O(N), giving O(N2). Compare that to adding to a vector and then sorting it, which is usually O(NlogN).

On the other hand, a multimap might work ok, and you'll populate that in O(NlogN), but that's not necessarily a gain over a simple array or vector that you sort.
Back Inserting a vector can be O(1) if you reserve the space. But The multiply sprites in one texture is a good idea.
Quote:Original post by Cocalus
A wide range of textureids (I think textureids are normally given out consecutively,so it shouldn't be a problem) could be handled with some kind of mapping of textureids to a smaller range.


That raises an interesting point. At this point in time I'm sorting by texture name. Do you think I'd see a significant improvement sorting by texture id? Enough to justify it?

This topic is closed to new replies.

Advertisement