optimised rendering

Started by
6 comments, last by jorgander 19 years, 9 months ago
Howdy.. I'm optimising my renderer and i would like to ask you guys for advice; if thats ok :P As i traverse my 'spacial partitioning' structure :) would it be better to: a.) Sort the polys in each node by texture (during init), then (during runtime) draw each visible node as i come across it (or) b.) (during runtime) Add the polys in visible node(s) to an array, then, when we finish, sort them by texture, then draw all at once (or) c.) I am way off, niether method is good, there is a better way! Cheers mates Muncher
Advertisement
I think that about sorting, just sort them front back to front and you should be fine. The texture switch hit shouldn't be that much, but if you have a good partitioning structure(BSP, OctTree, QuadTree), or whatever, it probably gives you a hint which polygons to throw out, so just draw them from back to front,, throwing out the unseen polys, with the exception of blending, etc.. that has to be done otherwise.


For sorting I use the following approach, which may not apply to your system. Going from major sort to subsort:
1. layer : as in your world layer, gui layer, popup layer etc.2. alpha : sort the objects into those with transparency etc         : because you will generally need to render transparent         : after you render the solid stuff3. effect file: DX related..4. texture: sort based on texture sharing5. buffers: finally do the buffers.6-8. whatever.

None of the above values are fixed in type, ie 1. doesn't have to be layers, it is entirely up to the user to decide what the flags are used for. For instance you could skip layers and just rely on the old z depth approach. I have my system set up for a maximum of 8 sort values, Bitwise ops can also give you additional values since it is just a standard <> test.

I am interested in how others sort their renderable objects too, as I am in the middle of revising my buffer manager, so lets hear them guys.
I'd personally liek to question what kburkhart said, because 1) with z-buffering it takes care of the sorting for you, and 2) because i heard texture switches are VERY expensive (relatively speaking) its one of many "state changes" and batching is there to avoid that
-Dan
When General Patton died after World War 2 he went to the gates of Heaven to talk to St. Peter. The first thing he asked is if there were any Marines in heaven. St. Peter told him no, Marines are too rowdy for heaven. He then asked why Patton wanted to know. Patton told him he was sick of the Marines overshadowing the Army because they did more with less and were all hard-core sons of bitches. St. Peter reassured him there were no Marines so Patton went into Heaven. As he was checking out his new home he rounded a corner and saw someone in Marine Dress Blues. He ran back to St. Peter and yelled "You lied to me! There are Marines in heaven!" St. Peter said "Who him? That's just God. He wishes he were a Marine."
Quote:Original post by kburkhart84
I think that about sorting, just sort them front back to front and you should be fine. The texture switch hit shouldn't be that much, but if you have a good partitioning structure(BSP, OctTree, QuadTree), or whatever, it probably gives you a hint which polygons to throw out, so just draw them from back to front,, throwing out the unseen polys, with the exception of blending, etc.. that has to be done otherwise.

Draw back-to-front?That was only necessary in the original DOOM days,now we have z-buffers,you know.In fact,if you're going to sort polys,sorting them front-to-back is better,because it will reduce overdraw.And texture switches are very expensive.

The above post was mine,by the way.
for standard geometry (stuff which isnt semi-transparent) you should draw in a rought front to back order (doesnt have to be perfect) to take advantage of such things such as early z-rejection in the hardware.

semi-transparent objects have to be drawn back to front for blending to work properly (you dont have to polygon sort those objects either, playing with the culling gives acceptable results even if it does mean drawing the object twice).

in the state change area of things shaders are the most expensive change you can make, followed by textures and finaly VBO buffer switches (well, the buffer switch its self isnt expensive, but rebinding the gl*Pointers can be). Other state changes also cost you but they are the big 3.

So, in an ideal world you'll want to minamize the state changes required to draw a scene, so you'll want to draw all the visable nodes with a certain shader active and switch textures and VBOs as needed, then switch shader and repeat and so on.
however, this is a balancing act in its self (for example you might have alot of objects which use a certain texture but all use a different shader, in which case you might be better of switching shaders more often instead of switching textures).
i recently perused the gdc2004 movies on the ati/nvidia sites and got some useful info on vbo's. one of the things that ati said was to render at least 100 triangles per API call. for that reason, i batch rendering calls.

one thing i'm still unsure of tho... where do u draw the line between increasing parallelism and avoiding 'useless' API calls? u want to make sure an API call renders enough so that the gpu has plenty to do while you're using the cpu, but at the same time u don't want to render too much so that the next API call has to wait for the previous one to finish. obviously u'll never get it sync'ed to the nanosecond (is there a way?), but i'm sure there's plenty of room for optimization given a naive implementation.

for my level data structure, each leaf node holds a start and end index into the level vbo for triangles it contains (one vbo for entire level). this means that that all triangles in a node must be consecutive in the vertex buffer, which isn't all that hard. this also means that if a branch node is totally visible (wholly in the frustum), it can easily be added to the visible nodes without recursing it's child nodes. when it has been determined that a node is visible, it calls a function that adds it's index range (start and stop) to the currently visible indices; this function optimizes the ranges somewhat - if two visible ranges are next to each other, it will concatenate them to one range. when the recursion is done, the visible ranges are rendered with one API call each.

This topic is closed to new replies.

Advertisement