Sign in to follow this  
kRogue

avoiding *expensive* state changes idea: comments wanted

Recommended Posts

Hi all, ok, before I start to do this, I want to know what others think to see if it is worth all the trouble of doing. Ok, here we go, in this context, I am using either Cg or GLSL (maybe both) and the basic mentality that I have seen so far has been: shader changes are most expensive state changes followed then by texture changes. Ok, so with that in mind: a simple idea: 1) Render world and models (what should be rendered according to all sorts of culling stuff) front to back, but with texturing and GLSL all disbaled (i.e. just to setup the depth value so that the GPU can do early z-culling) 2) sort the objects of the world that was rendered in (1) first by shader then by texture and render those [the associated modelView matrix is just from the camera position] 3) now same for the models, BUT now the nature of the modelviw matrix becomes more importnant now. So, (assuming I am not fiddling witht the texture co-ordinate matrix, and since I am using shaders, most other state issues I dont worry about [ok stencil and alpha testing though..]) as before we first sort all of our "draw commands" first by shader and then by texture(s). now models may be made up of various objects which have aditional transformation (for example, old school md3 models) so we need to keep track of those, thus for each "Draw Commend" we need to have a structure that stores (atleast) a) Shader b) Texture(s) c) Matrix State d) what paramaters to pass the glDraw* now, my thinking was to create my own matrix state stack (in Software).... when I fill in the above structure rather than calling glPushMatrix, I call myPushMatrix.... then when I (finally) get to putting the draw command for a particualr mesh, I use "myCurrentMatrix" as the matrix state. then we sort this mess of stuff 1st by shader then by texture (truth is that we can practically do most of that sorting before hand, and as models come and go we just update the list) then we issue the draw commands by simple setting the GL_MODELVIEW matrix as what is stored in our matrix state for each "drawing command" Now my questions: 1) is this worth the bother? Does anyone have some banchmarks on how mch trouble it is worth going through to do this? I ask becuase many of the 3d toolkits (like OGRE, Panda) ect, (I think) do not do this..... 2) the matrix computations should be quite minor (we are talking basicly the number of matrix multiplies is queal to the number of object (a model can be made up of multiple object, agian for example a quake3 stlye md3 model has 4 objects[head,torso,leggs and gun]) infact, the the cost of doing the matrix multiplies of implementing my own matrix stack should be tiny compared to all the SLERP'ing that is going on (typiclly 1 slerp per sub-object) 3) in relation to 1), perhaps one finer more sort after by testure(s), we also sort by which VBO(s) are being for the drawnig commands.... the nVidia docs on VBO's say the the real heavy weight part of them are in glVertexPointer (and glColorPointer, ect) as a side note, my VBO's _should_ all be static (becasue I do all my skinning on the video card, and since the driver should be better at it than me, even if I end up using keyfames, putting that into VBO (and then deleting my memroy to it ) should be better since the driver decides where it is resident (i.e. video card or memory, does it shadow that by the way?) Comment and ideas most appreciated est Regards

Share this post


Link to post
Share on other sites
Well, sorting by matrix changes is probably not useful. For one thing, the CPU power necessary to multiply two matrices is quite trivial...and if you're using glMultMatrixf, it's not a bad guess that the driver is using SSE to do the multiply. In short, don't worry about matrix changes at all. Most of the time, objects won't share the same matrix anyway.

What IS useful, though, is sorting by the vertex buffer in use. I won't really go into detail on that, but it's something to consider.


So, what you really need to do is to accumulate a complete list of visible stuff to be rendered. Instead of simply feeding information to GL directly, maintain a render queue that has enough information in order for you to call GL later with all of the necessary parameters. Then, when you're going to finally flush that list, sort it by shader and by texture, and then render in that order.

Sorting by shader and texture is almost certainly worth the effort...sorting by buffer less so, but it's also useful. Now, for the 'world', it really depends on what your world is. A quake 3 map would be rendered rather differently from a terrain, for example.

Share this post


Link to post
Share on other sites
Quote:
1) Render world and models (what should be rendered according to all sorts of culling stuff) front to back, but with texturing and GLSL all disbaled (i.e. just to setup the depth value so that the GPU can do early z-culling)

i dont think its to important if u sort things here or not, the reaason being overdraw doesnt matter so much since youre only laying downn the depth infomation ie glColorMask(0,0,0,0) which is very fast. now if u had some semicomplicated shader enabled then u wanna minimize overdraw

2/ in program startup i create various materials (each is a various states/texture/shader etc), perhaps theres 300 materials in all, these get sorted during startup time.
during gameplay i loop through all the materials seeing if there are anymeshes that use this material inside the frustum (and wanna be drawn) if so i draw them.
ie theres no sorting of materials at all during gameplay

also IIRC i dont use glPushMatrix, glPopMatrix, glMultMatrix to position things i do all the math myself sticking evereything into worldspace coordinates
also IIRC i only use glLoadMatrix(..) straight after glMatrixMode( GL_MODELVIEW );

then again your situation might be different

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Anything you can do to combine batches and thus submit one less batch to the GPU is extremely useful. The problem with sorting by the matrix state is it's highly unlikely that there would be any benefit.

More than likely your static "world" is defined in one "space" ie: "world coords". No sorting by matrix state needed here since everything would have the same matrix state.
Each model probably has their own local space - thus each model has a unique matrix state and sorting is once again useless.

Splitting what could be a potential single batch into multiple batches is bad enough. Adding state changes on top will make things worse. (As you are aware, some state changes are worse than others.) Sorting by shader/texture helps minimize the batching but you still have to break that batch to switch to a different matrix state. The only way around this (which I haven't tried) is that the matrix would be part of the data sent to the GPU and the vertex is transformed in a vertex shader. This would allow multiple instances of the same shader/texture to be rendered in one batch. I don't know how efficient this is (since I never tried it) but I only mention it for one reason: If this is done then the matrix state is not part of the sort criteria anymore!

my 2 cents worth.

Share this post


Link to post
Share on other sites

I would not sort by matrix state, that was not the idea.. jsut sort first be shader then by texture... that objects are made up of subobjects forces me to "keep" track of somekind of matrix state... the matrix state is only "tracked", it does no influence the sorting... and yes the cost of matrix mupltiplies is tiny, especcially when compared to the cost of sphereical lnear interpolation...

as for why I do the first pass: I will have shaders of a significant level of comlexity; at the very least per-pixel lighting (with Specular lighting) via normal maps... so avoidnig overdraw is key (I imagine). The first pass is rendred jsut to get the depth values "ready" for the remaining passes...

also, tpyically a single sub-object is made of up of several meshes, each with their own texture, so they will each require their own call to openGL, hence it is ok to seperate the model by its subobjects.. a smallest mesh size is about 200 triangles, but the more typical is about 2000 triangles......



Best Regards

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this