Jump to content
  • Advertisement
Sign in to follow this  
ongamex92

OpenGL Is redundant state checking still a thing?

This topic is 923 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is redundant state checking still a thing? I'm interested in D3D11 and OpenGL3+.

I've got a D3D11ContextStateCache, and GLContextStateCache objects that keep track of the current state and skip calls that would do nothing, i also got an option to disable redundant state checking and directly call the API function, and for my surprise there was absoluetley no difference.
(3600 draw calls with same vb, input layout and a texture, 1 ConstBuffer update via Map(hmmm???))

The same thing goes to OpenGL.

 

Note that I'm not calming anything as my scene may not be optimal for the case (and as I'm writing this I start to get doubts about it).

 

And one additional question, Is there any point of using D3D11_USAGE_IMMUTABLE in practice, again I see no different betteen that one and 
D3D11_USAGE_DEFAULT?

Share this post


Link to post
Share on other sites
Advertisement

>>  and for my surprise there was absoluetley no difference.

 

you would only see a difference if:

1. the calls you make are calls that will introduce delays when called redundantly.

 

= AND =

 

2. you actually makes redundant calls of that type in the first place - enough to be noticeable.

 

there's a good chance you're not making enough redundant calls that introduce delay to see a difference. which would indicate you can reduce or perhaps eliminate state checks, assuming your code stays well organized with respect to draw call order and state changes. if you make no redundant calls, there's technically no need for state management.

Edited by Norman Barrows

Share this post


Link to post
Share on other sites

To clarify myself : 

I currently draw a low-poly model (can't tell you the primitive count currently). 
I draw that model 100 000 times with different position (this is why I call Map/Unmap on a ConstatBuffer), the resources needed to draw the object do not change (one texture and one vertex buffer), and in that case, there is no difference if i bind those once (for frame 1) vs vs binding them every frame, so in my case:

bind_vertexbuffer();
bind_texture();
bind_cb();
for(i = 0; i < 100000; ++i){
   update_cb(i);
   draw();
}

has the same performance as:

for(i = 0; i < 100000; ++i){
   bind_vertexbuffer();
   bind_texture();
   bind_cb()
   update_cb(i);
   draw();
}
Edited by imoogiBG

Share this post


Link to post
Share on other sites
I'm with hodgman on this one. In times where performance is key and you want clean and futureproof code, it's good to be in control of states. With the benefit that you automatically have an opportunity to see current states, using the same state manager which prevents redundant state changes.

Share this post


Link to post
Share on other sites

I'm guessing here the driver is actually checking if the bindings are actually changed before doing anything when you rebind the resources. Rebinding the same resources again and again makes it go through this fast path so you see no differences.

 

Thing is, you'd be relying on driver specific behavior. You should try on different hardware.

Share this post


Link to post
Share on other sites

To clarify myself : 

I currently draw a low-poly model (can't tell you the primitive count currently). 
I draw that model 100 000 times with different position (this is why I call Map/Unmap on a ConstatBuffer), the resources needed to draw the object do not change (one texture and one vertex buffer), and in that case, there is no difference if i bind those once (for frame 1) vs vs binding them every frame,

So you're actually dynamically making 100k cbuffers per frame and handing them all to the garbage collector. In both your loops, this will be the bulk of the cost.

 

Seeing every draw is using a different cbuffer, the driver does have to emit new resource bindings per draw.
Try pre-creating 100k static cbuffers and pre-filling them with data so you don't need to do this work per frame, and see how that affects performance.

Or just for testing, use a single static cbuffer so that the driver doesn't have to rebind resources per draw, and see how that performs.

Edited by Hodgman

Share this post


Link to post
Share on other sites

The CBuffer in my example above is also 1 cbuffer for all 100k drawcalls, it's just updated with map/unmap before every drawcall.

So you're actually dynamically making 100k

If map/unmap reallocation is making than yes.

 

as far is i rememember "measuring" cbuffer binding alone is much more expensive compared to map/unmap.

Edited by imoogiBG

Share this post


Link to post
Share on other sites

The CBuffer in my example above is also 1 cbuffer for all 100k drawcalls, it's just updated with map/unmap before every drawcall.

So you're actually dynamically making 100k
If map/unmap reallocation is making than yes.
 
as far is i rememember "measuring" cbuffer binding alone is much more expensive compared to map/unmap.

 

That's what I meant before:

Side note - updating a constant buffer causes resource renaming within the driver -- your resource handle (D3D COM pointer) now points to a different memory allocation than before, which probably forces D3D to set a whole bunch of internal dirty flags that get checked on the next update.
So, actually updating the constant buffer is probably hiding the cost of a PSSetConstantBuffers call (as it's probably also just setting the same dirty flags, to be checked on next draw).

You can't edit a resource that's in use by the GPU. The GPU is one frame behind the CPU. Therefore in order to make it look like you're editing a resource, the driver is actually performing reallocation. If you update the resource 100k times per frame, you're peforming 100k reallocations, and asking a garbage collector to delete them in a few frame's time when the GPU has finished using them.

 

Binding the same resource repeatedly might be cheap, but each one of your draw calls is actually binding different resources. So both of your loops have a high memory allocation cost and resource binding cost per draw call.

Share this post


Link to post
Share on other sites

Can't speak for desktop but I recently did some optimization on a WebGL game and one of the first things I did was introduce dumb "if(g_currentBoundTexture === newTexture) doNothing(); else bindTexture" and similar checks and gained a very much appreciated (ballpark) 5% or so speedup for maybe half an hour of work.

 

Through some dirty happenings this goes down from actual Javascript WebGL code to Chrome native to ANGLE and then to the the eventual Direct3D 9 implementation (on Windows), so some part of it probably means "in D3D9 on desktop state redundancy checking is still pretty good".

Edited by agleed

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!