Hello,
quick question about redundant state checking:
Is there actually a performance overhead if you bind resources that are already bound (via VSSetConstantBuffers, ...) in case that you need to call it anyway to update at least one other resource? To give an example, say I allow 6 bound textures. At one draw call, texture 2 and 4 changes. What I do now is check for the first and last resource that changed, fill a static array with those, and bind that array, which would translate to:
VSSetShaderResources(2, 3, textures); // 2 = first slot, 3 = number of textures
This reduces the number of API calls to an absolute minimum, but also requires a quite complicated state checking per draw call, per resource type and per resource:
template<typename ResourceType, int NumResources>
CheckedResource<ResourceType, NumResources> checkSomething(const std::array<ResourceType*, NumResources>& resources, std::array<ResourceType*, NumResources>& lastResources, unsigned int startSlot, unsigned int endSlot)
{
unsigned int firstSlotChanged = NumResources, lastSlotChanged = 0;
// resources changed
for(unsigned int i = startSlot; i <= endSlot; i++)
{
if(resources[i] != lastResources[i])
{
lastResources[i] = resources[i];
firstSlotChanged = min(firstSlotChanged, i);
lastSlotChanged = max(lastSlotChanged, i);
}
}
CheckedResource<ResourceType, NumResources> checkedResources;
if(firstSlotChanged != NumResources)
{
checkedResources.firstSlot = firstSlotChanged;
checkedResources.numResources = lastSlotChanged - firstSlotChanged + 1;
// todo: memcpy or std::copy
for(unsigned int i = 0; i < checkedResources.numResources; i++)
{
checkedResources.buffers[i] = resources[i + firstSlotChanged];
}
}
return checkedResources;
}
I need to loop through every resource and see whats the first and whats the last changed resource.
What I could do alternatively is just see if any texture changed, and then shove the whole array of bound textures onto the API:
VSSetShaderResources(0, MAX_TEXTURES, boundTextures); // also saves me constructing the array
Which would change the above method to a simple:
template<typename ResourceType, int NumResources>
bool checkSomething(const std::array<ResourceType*, NumResources>& resources, std::array<ResourceType*, NumResources>& lastResources, unsigned int startSlot, unsigned int endSlot)
{
for(unsigned int i = startSlot; i <= endSlot; i++)
{
if(resources[i] != lastResources[i])
{
lastResources = resources; // copy whole resource block, since more might have changed but we are merely checking the first
return true;
}
}
return false;
}
So way less iterations, and way less instructions/branches, but now I'm potentially telling the API that it needs to change 6 texture, when only the first one really changed. Is there any (CPU) overhead here, or do I save more time by doing the more minimalistic approach of checking (like in my second method)?
Also, since I already wrote a little more about state checking than probably required for this simple API-specific question:
How much state checking do you actually do?
Currently I have a system where I collect all states in the device, plus what states are currently bound (m_state and m_lastState), and before every draw call, I check which states to update. For most things this is pretty simple, I need to compare if what is bound differs from the current state (shader, depth state, rasterizer state, ...), but for resources (sampler, cbuffer, textures), this is way more complicated.
While on the high-level, I only have shared resources between shader stages (like the effect framework does), but in the back-end renderer, I obviously only want to bind resources to that stages that actually need them (ie. I support domain/hull shader, but only 1 shader currently uses it, so I obviously don't want to bind every texture to the domain/hull stage unless the shader supports it.
So what I do is, I get a range of stages the current shader has (2-5), loop over those, and then for every type of resource currently supported (3) I perform the state checking I posted above, which looks like this:
for(unsigned int shader = 0; shader <= m_state.pEffect->GetMaxShader(); shader++)
{
// CBUFFERS
{
const auto shaderData = m_state.pEffect->GetCbufferData(shader);
const auto checkedResources = checkSomething(m_state.cbuffers[shader], m_lastState.cbuffers[shader], shaderData.startSlot, shaderData.endSlot);
if(checkedResources.numResources > 0)
BindCBuffer(shader, checkedResources.firstSlot, checkedResources.numResources, checkedResources.buffers);
}
// TEXTURES
{
const auto shaderData = m_state.pEffect->GetTextureData(shader);
const auto checkedResources = checkSomething(m_state.textures[shader], m_lastState.textures[shader], shaderData.startSlot, shaderData.endSlot);
if(checkedResources.numResources > 0)
BindTextures(shader, checkedResources.firstSlot, checkedResources.numResources, checkedResources.buffers);
}
// SAMPLER
{
const auto shaderData = m_state.pEffect->GetSamplerData(shader);
const auto checkedResources = checkSomething(m_state.sampler[shader], m_lastState.sampler[shader], shaderData.startSlot, shaderData.endSlot);
if(checkedResources.numResources > 0) // TODO: we didn't need this branch before generalizing this in a method
BindSampler(shader, checkedResources.firstSlot, checkedResources.numResources, checkedResources.buffers);
}
}
Is this amongst the line of CPU-work you guys invest into redudant-state filtering, or is there an even simplier method? I first wanted to store a bool for every type of resource that only gets set if I bind a resource that actually changes to save checking every resource very drawcall, but this doesn't work due to the partial-binding model I'm using (one shader can only have vertex and pixel shader, so it will bind the changed texture to those stages. However down the line another shader might get bound the now has a geometry shader but need the same texture there).
So how does your state-filtering look like? Am I somewhat on the right track, or do you use something complely different?
Thanks!
PS: I currently don't have any meaningful 3D scene to benchmark this (I'm in the process of getting the renderer to work and actually build a useful toolchain), and I'd also like some more theoretical knowledge, thats why I'm asking.