Is redundant state checking still a thing?

Started by
8 comments, last by agleed 7 years, 10 months ago

Is redundant state checking still a thing? I'm interested in D3D11 and OpenGL3+.

I've got a D3D11ContextStateCache, and GLContextStateCache objects that keep track of the current state and skip calls that would do nothing, i also got an option to disable redundant state checking and directly call the API function, and for my surprise there was absoluetley no difference.
(3600 draw calls with same vb, input layout and a texture, 1 ConstBuffer update via Map(hmmm???))

The same thing goes to OpenGL.

Note that I'm not calming anything as my scene may not be optimal for the case (and as I'm writing this I start to get doubts about it).

And one additional question, Is there any point of using D3D11_USAGE_IMMUTABLE in practice, again I see no different betteen that one and
D3D11_USAGE_DEFAULT?

Advertisement

Yeah you should still avoid redundant state setting, if you can do so cheaply yourself.

Many D3D/GL functions could be as simple as storing some pointers and setting a dirty flag -- with the real cost occurring in the next draw call.

Other functions can have quite a bit of validation overhead. I remember recently I was measuring an OMSetRenderTargets call as high as 300?s :(

Side note - updating a constant buffer causes resource renaming within the driver -- your resource handle (D3D COM pointer) now points to a different memory allocation than before, which probably forces D3D to set a whole bunch of internal dirty flags that get checked on the next update.

So, actually updating the constant buffer is probably hiding the cost of a PSSetConstantBuffers call (as it's probably also just setting the same dirty flags, to be checked on next draw).

You should use the correct usage hints where it's feasible for you to do so. Immutable allows the driver to greatly simplify memory management for a resource - which could mean CPU time, GPU time, CPU space and/or GPU space savings.

>> and for my surprise there was absoluetley no difference.

you would only see a difference if:

1. the calls you make are calls that will introduce delays when called redundantly.

= AND =

2. you actually makes redundant calls of that type in the first place - enough to be noticeable.

there's a good chance you're not making enough redundant calls that introduce delay to see a difference. which would indicate you can reduce or perhaps eliminate state checks, assuming your code stays well organized with respect to draw call order and state changes. if you make no redundant calls, there's technically no need for state management.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

To clarify myself :

I currently draw a low-poly model (can't tell you the primitive count currently).
I draw that model 100 000 times with different position (this is why I call Map/Unmap on a ConstatBuffer), the resources needed to draw the object do not change (one texture and one vertex buffer), and in that case, there is no difference if i bind those once (for frame 1) vs vs binding them every frame, so in my case:


bind_vertexbuffer();
bind_texture();
bind_cb();
for(i = 0; i < 100000; ++i){
   update_cb(i);
   draw();
}

has the same performance as:


for(i = 0; i < 100000; ++i){
   bind_vertexbuffer();
   bind_texture();
   bind_cb()
   update_cb(i);
   draw();
}
I'm with hodgman on this one. In times where performance is key and you want clean and futureproof code, it's good to be in control of states. With the benefit that you automatically have an opportunity to see current states, using the same state manager which prevents redundant state changes.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

I'm guessing here the driver is actually checking if the bindings are actually changed before doing anything when you rebind the resources. Rebinding the same resources again and again makes it go through this fast path so you see no differences.

Thing is, you'd be relying on driver specific behavior. You should try on different hardware.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

To clarify myself :

I currently draw a low-poly model (can't tell you the primitive count currently).
I draw that model 100 000 times with different position (this is why I call Map/Unmap on a ConstatBuffer), the resources needed to draw the object do not change (one texture and one vertex buffer), and in that case, there is no difference if i bind those once (for frame 1) vs vs binding them every frame,

So you're actually dynamically making 100k cbuffers per frame and handing them all to the garbage collector. In both your loops, this will be the bulk of the cost.

Seeing every draw is using a different cbuffer, the driver does have to emit new resource bindings per draw.
Try pre-creating 100k static cbuffers and pre-filling them with data so you don't need to do this work per frame, and see how that affects performance.

Or just for testing, use a single static cbuffer so that the driver doesn't have to rebind resources per draw, and see how that performs.

The CBuffer in my example above is also 1 cbuffer for all 100k drawcalls, it's just updated with map/unmap before every drawcall.


So you're actually dynamically making 100k

If map/unmap reallocation is making than yes.

as far is i rememember "measuring" cbuffer binding alone is much more expensive compared to map/unmap.

The CBuffer in my example above is also 1 cbuffer for all 100k drawcalls, it's just updated with map/unmap before every drawcall.


So you're actually dynamically making 100k
If map/unmap reallocation is making than yes.

as far is i rememember "measuring" cbuffer binding alone is much more expensive compared to map/unmap.

That's what I meant before:

Side note - updating a constant buffer causes resource renaming within the driver -- your resource handle (D3D COM pointer) now points to a different memory allocation than before, which probably forces D3D to set a whole bunch of internal dirty flags that get checked on the next update.
So, actually updating the constant buffer is probably hiding the cost of a PSSetConstantBuffers call (as it's probably also just setting the same dirty flags, to be checked on next draw).

You can't edit a resource that's in use by the GPU. The GPU is one frame behind the CPU. Therefore in order to make it look like you're editing a resource, the driver is actually performing reallocation. If you update the resource 100k times per frame, you're peforming 100k reallocations, and asking a garbage collector to delete them in a few frame's time when the GPU has finished using them.

Binding the same resource repeatedly might be cheap, but each one of your draw calls is actually binding different resources. So both of your loops have a high memory allocation cost and resource binding cost per draw call.

Can't speak for desktop but I recently did some optimization on a WebGL game and one of the first things I did was introduce dumb "if(g_currentBoundTexture === newTexture) doNothing(); else bindTexture" and similar checks and gained a very much appreciated (ballpark) 5% or so speedup for maybe half an hour of work.

Through some dirty happenings this goes down from actual Javascript WebGL code to Chrome native to ANGLE and then to the the eventual Direct3D 9 implementation (on Windows), so some part of it probably means "in D3D9 on desktop state redundancy checking is still pretty good".

This topic is closed to new replies.

Advertisement