Sign in to follow this  
Steve132

OpenGL State Change Benchmark Data?

Recommended Posts

I haven't been on this forum in a very long time...so I dont' know how much is kept up around here, but anyway... I had a relativeliy advanced question for you all.. It is well understood that OpenGL is a state machine, and to manage it, you run state changes on the machine to manipulate the current render state as geometry passes through... In addition, certain state changes are inherently more expensive than others...for example, it is more expensive to change the current bound texture than it is to change the current render color...the same goes for changing the clear color vs turning off the depth buffer, etc. ANYWAY. My point is this... How would I go about coming up with the actual state change approximate costs for each state change operation in opengl? Is there a spec with this data? For example, a change to the current color might be cost 1x, whereas a change to the texture might be 30x....is there some table somewhere giving costs for state change operations? This might be implementation dependant, if so, then is there a general table with approximations? What might be helpful even more is this...can anyone think of a way to collect this data imperically? perhaps an opengl state benchmark program? How would such a program be written? Write me back.

Share this post


Link to post
Share on other sites
I don't know if there really is a significant change in speed by changing different states.
( matrix changes might be slower than other changes )
But that's just my thoughts. I don't know.

But anyway, shouldn't gDEBugger or some other OpenGL debugger be able to do this?

Share this post


Link to post
Share on other sites
I've read something that suggested that the answer to your question is simply: don't worry about it, states are optimized already and what happens in the hardware is constantly improving.

At least that is how I took this rant on why scene graphs are a waste of time.

Of course that is only one source and I've yet to see a similar sentiment anywhere else.

Share this post


Link to post
Share on other sites
State changes can be put into 3 or 4 different groups

First group and by far the slowest group is where the driver has to do a lot of validation work, it has to check weither there actually is data there and is it ok and so on.
this group includes texture changes and simmilar.

Then theres the middle group, shure it takes some processing to work things out it's relatively fast, so fast they could be ignored, just don't go state trashing with them ok.
This group includes most other states.

Finally theres a group that really are not state changes, they just behave like them, glColor is a good example, in reallity it's just a variable that everytime you call glVertex it's color value gets sent to the GPU, and no real processing is required.

There is a fourth group to that eats a lot of processor time, though they are not technically state changes, but they can stall the CPU or the GPU and that is so very bad.
This group includes functions like glFlush, glReadPixels and so on.

Well this is at least how it looks today, but comming in openGL 3.0 we have a new object model, it makes textures be validated on texture load and not everytime you want to use them, thus speeding up most state changes.
So in the future, state changes are not so much of a bother as it used to be.

Share this post


Link to post
Share on other sites
lc_overlord...Thanks..those 4 state groups...is there a practical way to calculate what states go where? Or perhaps you could point me to a list or way of figuring it out for myself. That would be very helpful to me.

Share this post


Link to post
Share on other sites
no, generally only glBindTexture, glAccum, glClear, some of the FBO and stuff and most texture operations is in the first group.
glColor, glTexcoord, glNormal, glFogCoord and such are in the third group.
the rest is in the second one, some in the second group can be in the gray area between group one and two, though most of this varies between harware and software implementations and thus it's a bit hard to say for sure.
There is this new extention from nvidia that can mesure time in picoseconds that can preform these tests but it's only for the latest quadro cards.

Anyway the general idea is to keep glBindTexture calls to a minimum and let shaders do all the hard work, then it doesn't matter how fast the statechanges are.

Share this post


Link to post
Share on other sites
I originally began a research of this kind but I later realized the information was often unreliable and prone to change between drivers (not even hardware!).

A thing I noticed however is that the cost of something is in some way related to the workload: I guess this does not surprise you!

On some nVIDIA papers it is stated that the more you go down in the hardware, the less state change is expensive. I guess this has to do with the fact there's often more validation on the 'near end' than on the frame buffer output (consider BlendFunc against VertexAttribPointer: it does make sense).

I am also confident benchmarking this is of little pratical use.

Share this post


Link to post
Share on other sites
Quote:
Original post by Krohm
On some nVIDIA papers it is stated that the more you go down in the hardware, the less state change is expensive. I guess this has to do with the fact there's often more validation on the 'near end' than on the frame buffer output (consider BlendFunc against VertexAttribPointer: it does make sense).


Yea, that is why the new object model in opengl 3.0 will be faster since most of the validation work is already done.

Theoretically even with todays hardware one could write a API that use virtual texturing (that is, not having to bind textures, you just use them any way you like).

Share this post


Link to post
Share on other sites
I don't know any specs with actual opengl state change costs. I don't think that the relative costs of state changes are really important. Because usually there is no alternative state change(s) to a state change. It doesn't matter if it's in group1, group2 or gropu3. We just have to write optimized code and avoid as much unnecessary state changes as we can. It is - we can't replace glBindTexture with something cheaper. I mean I can't :p

Share this post


Link to post
Share on other sites
no, currently we can't directly replace glBindTexture, i think we will within a year or so, but today we can work around it by using things like multitexturing together with shaders and optimizing the usage of texture state changes.

Share this post


Link to post
Share on other sites
Quote:
Original post by Steve132
How would such a program be written?

I tried calling 100 - 100000 same functions and using glFinish after loop. But I don't think that glFinish has any effect on glBindTexture. I Only found out that: glBindTexture is 15x - 300x slower than glColor and that opengl optimizes many things for you.

Share this post


Link to post
Share on other sites
Quote:
There is this new extention from nvidia that can mesure time in picoseconds that can preform these tests but it's only for the latest quadro cards.


Thanks for all the help! So far this is all very useful. I may even be able to do what I wanted just based on what you said.... Out of curiosity however, what is this extension called? I have a mobile quadro card in my laptop, and it may be too old, but I would like to look into this, but couldn't find it in the extension registry.

Share this post


Link to post
Share on other sites
It's called GL_NV_timer_query or GL_EXT_timer_query, it's not in the registry because
1. the GL_NV_timer_query version is sort of a public beta of a sorts, the real one will be called GL_EXT_timer_query
2. the registry is not known for updating often

A link to the spec might perhaps be in order.

O and it's in nanoseconds nothing else i might have mentioned, i think that was from when one of the devs just let it slip that picoseconds will be possible to do in the future and maybe this extension should be able to do that when that day comes.

Share this post


Link to post
Share on other sites
Wow, cool! I actually have this extension in my card! Cool! Anywho...ok, well, thanks so much...would anyone be interested in a benchmark dataset or an app to try this on their own card? I intend on writing such an app, but it might be nice to have help...perhaps I can start with an open-source graphics test suite or something. Anyone have any thoughts?

Share this post


Link to post
Share on other sites
If you want to benchmark these things, you need to be aware that practically every major implementaiton attempts to defer state management until the state is actually needed (ie, you actually draw something).

If you just put a whole bunch of glBindTextures in a loop, it will be quite fast, because basically no work is being done. It will actually be glDrawArrays (or glBegin, or whatever) that takes longer to execute.

The following is a rough grouping of state changes from slowest to fastest. Depending on the given implementation, the list may shift around a bit, but this is a reasonable average.

Program binding
FBO binding
Texture binding
Vertex array specification
Buffer binding
glUniform*
Update current vertex state (glColor, glVertexAttrib, etc).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this