Jump to content

  • Log In with Google      Sign In   
  • Create Account

Husbjörn

Member Since 27 Jan 2014
Offline Last Active Today, 02:59 AM

Posts I've Made

In Topic: False negative result in bounding frustum test

Today, 03:01 AM

Maybe you cna share your algorithm for frustum culling? Maybe I am missing something here (some space translations or something else?

Certainly, though it is quite old and could probably do with a revision.

I'm using axis aligned bounding boxes by the way, storing the min and max extents along the X/Y/Z axes in object space:

void BoundingBox::Transform(const XMMATRIX& mat, XMFLOAT3& vecMin, XMFLOAT3& vecMax) const {
	XMFLOAT3 coord[8];
	// Front Vertices
	XMStoreFloat3(&coord[0], XMVector3TransformCoord(XMVectorSet(vecBaseMin.x, vecBaseMin.y, vecBaseMin.z, 1.0f), mat));
	XMStoreFloat3(&coord[1], XMVector3TransformCoord(XMVectorSet(vecBaseMin.x, vecBaseMax.y, vecBaseMin.z, 1.0f), mat));
	XMStoreFloat3(&coord[2], XMVector3TransformCoord(XMVectorSet(vecBaseMax.x, vecBaseMax.y, vecBaseMin.z, 1.0f), mat));
	XMStoreFloat3(&coord[3], XMVector3TransformCoord(XMVectorSet(vecBaseMax.x, vecBaseMin.y, vecBaseMin.z, 1.0f), mat));
	// Back Vertices
	XMStoreFloat3(&coord[4], XMVector3TransformCoord(XMVectorSet(vecBaseMin.x, vecBaseMin.y, vecBaseMax.z, 1.0f), mat));
	XMStoreFloat3(&coord[5], XMVector3TransformCoord(XMVectorSet(vecBaseMax.x, vecBaseMin.y, vecBaseMax.z, 1.0f), mat));
	XMStoreFloat3(&coord[6], XMVector3TransformCoord(XMVectorSet(vecBaseMax.x, vecBaseMax.y, vecBaseMax.z, 1.0f), mat));
	XMStoreFloat3(&coord[7], XMVector3TransformCoord(XMVectorSet(vecBaseMin.x, vecBaseMax.y, vecBaseMax.z, 1.0f), mat));
}
void Camera::ReconstructFrustumPlanes() {
	// Left Frustum Plane
        // Add first column of the matrix to the fourth column
	frustumPlane[0].a = viewProj._14 + viewProj._11; 
	frustumPlane[0].b = viewProj._24 + viewProj._21;
	frustumPlane[0].c = viewProj._34 + viewProj._31;
	frustumPlane[0].d = viewProj._44 + viewProj._41;

	// Right frustum Plane
        // Subtract first column of matrix from the fourth column
	frustumPlane[1].a = viewProj._14 - viewProj._11; 
	frustumPlane[1].b = viewProj._24 - viewProj._21;
	frustumPlane[1].c = viewProj._34 - viewProj._31;
	frustumPlane[1].d = viewProj._44 - viewProj._41;

	// Top frustum Plane
        // Subtract second column of matrix from the fourth column
	frustumPlane[2].a = viewProj._14 - viewProj._12; 
	frustumPlane[2].b = viewProj._24 - viewProj._22;
	frustumPlane[2].c = viewProj._34 - viewProj._32;
	frustumPlane[2].d = viewProj._44 - viewProj._42;

	// Bottom frustum Plane
        // Add second column of the matrix to the fourth column
	frustumPlane[3].a = viewProj._14 + viewProj._12;
	frustumPlane[3].b = viewProj._24 + viewProj._22;
	frustumPlane[3].c = viewProj._34 + viewProj._32;
	frustumPlane[3].d = viewProj._44 + viewProj._42;

	// Near frustum Plane
        // We could add the third column to the fourth column to get the near plane,
        // but we don't have to do this because the third column IS the near plane
	frustumPlane[4].a = viewProj._13;
	frustumPlane[4].b = viewProj._23;
	frustumPlane[4].c = viewProj._33;
	frustumPlane[4].d = viewProj._43;

	// Far frustum Plane
        // Subtract third column of matrix from the fourth column
	frustumPlane[5].a = viewProj._14 - viewProj._13; 
	frustumPlane[5].b = viewProj._24 - viewProj._23;
	frustumPlane[5].c = viewProj._34 - viewProj._33;
	frustumPlane[5].d = viewProj._44 - viewProj._43;


	// Normalize planes
	for(unsigned int p = 0; p < 6; p++) {
		float length = sqrt(
                    (frustumPlane[p].a * frustumPlane[p].a) + 
                    (frustumPlane[p].b * frustumPlane[p].b) + 
                    (frustumPlane[p].c * frustumPlane[p].c)
                );
		frustumPlane[p].a /= length;
		frustumPlane[p].b /= length;
		frustumPlane[p].c /= length;
		frustumPlane[p].d /= length;
	}
}
bool Camera::FrustumCullBoundingBox(const XMFLOAT3 &vecMin, const XMFLOAT3& vecMax) {
	for(unsigned int p = 0; p < 6; p++) {
		if(XMlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMin.x, vecMin.y, vecMin.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMax.x, vecMin.y, vecMin.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMin.x, vecMax.y, vecMin.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMin.x, vecMin.y, vecMax.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMax.x, vecMax.y, vecMin.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMax.x, vecMin.y, vecMax.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMin.x, vecMax.y, vecMax.z, 1)) >= 0.0f)
			continue;
		if(XMPlaneDotCoord(&frustumPlane[p], &XMVectorSet(vecMax.x, vecMax.y, vecMax.z, 1)) >= 0.0f)
			continue;
		return false;
	}
	return true;
}

In Topic: False negative result in bounding frustum test

Yesterday, 01:52 PM

I'm not familiar with the BoundingFrustum class, but are you checking against every corner of the bounding box? Off the top of my head it sounds like you may just be checking its center coordinate. Another possibility could be if you're translating the bounding box (by the world matrix of your respective objects presumably) and somehow doing this incorrectly?


In Topic: How does hardware instancing work on the pixel shader level?

26 May 2016 - 02:01 PM

What are you using the derivatives for? It seems strange for spot light contribution.

Trilinear / anisotropic sampling of variance shadow maps.

To clarify, the reason to use ddx/ddy instead of just Sample was to prevent these divergence issues in the first place; it just seems they ended up too far into the dynamic branches in the case of spot / point lights; for directional lights they are separatedly computed (ie. not from the sampled coordinates as I mentioned in my last post) to not be divergent across shadow map cascades.

 

 

Chances are some of what you think are loops or branches as you've written them in HLSL may actually get flattened out by the compiler such that there isn't a loop or branch there at all. If the compiler can prove that a branch is 'coherent' among all threads in the wave/warp then it's not a problem to have gradient operations inside the control flow.

Hm... as Dingleberry said, I'm using [loop], [branch] and [unroll] attributes to try to control these things myself. I think that forces the compiler to adhere, or are they just suggestions like inline etc. in C++?

Furthermore the loop condition very much is dynamic in this case so I don't see how it even could be validly unrolled, unless to some arbitrarily chosen upper limit.


In Topic: How does hardware instancing work on the pixel shader level?

26 May 2016 - 01:03 PM

Aye, there are indeed ddx/ddy calls inside the function being called in the branch; moving them into the outer loop have solved the issue :)

 

The thing that threw me off here was that a similar approach for directional lights was working just fine having those intrinsics called from inside the flow control. I can only assume that the compiler is fine with a loop that has a constant buffer member determining its upper bound, but not when that bound is retrieved from a shader resource. Furthermore both of my directional light paths have the same ddx/ddy calculations, but they're still inside both separate branches (yes I should move them out of there!) and that apparently works, so it would seem the compiler must detect and be fine with this too, which I wouldn't have suspected.

 

 

In fact, it's probably slower than just calling Sample if you don't actually need the gradients for anything other than providing them to SampleGrad.

The derivatives aren't calculated on the sampled texture coordinates so that will sadly not work.


In Topic: How does hardware instancing work on the pixel shader level?

26 May 2016 - 03:27 AM

Hm yes... unfortunately it still doesn't seem to work however, which seems to suggest that the HLSL compiler can't guarantee there won't be any such divergence, and as such it refuses to compile my shader.

 

Here's my original light application loop:

/*
 * Process spot lights.
 * «spotLightData.x» contains an offset into the spot light table at which a set of uint indices
 *                   into the SpotLight buffer detailing what light sources affect the current mesh
 *                   instance begin.
 * «spotLightData.y» contains the number of spot light sources affecting the current mesh instance.
 */
for(n = 0; n < spotLightData.y; n++) {
	lightId = SpotLightTable.Load(spotLightData.x + (n * 4));
	sLightContrib contrib = ComputeSpotLightContribution(SpotLight[lightId], V, P.xyz, N);
	total.diffuse  += contrib.diffuse;
	total.specular += contrib.specular;
}

The above won't compile when ComputeSpotLightContribution uses SampleGrad / similar to sample a shadow map, raising error X4014:

 

Cannot have divergent gradient operations inside flow control.

The only possible source of such divergence here, as far as I can tell anyway, is that different instances may index into different light sources based on the light table lookup.

 

So I tried to change it to the following to ensure that all instances take the same flow path, and limit the actually processed lights with an if-branch instead:

/* Process spot lights */
uint tableIndex = 0;
lightId = SpotLightTable.Load(spotLightData.x + (tableIndex * 4));
[loop]
for(n = 0; n < NumSpotLights; n++) { // NumSpotLights is the total number of elements in the SpotLight buffer
	[branch]
	if(n == lightId) {
		sLightContrib contrib = ComputeSpotLightContribution(SpotLight[n], V, P.xyz, N);

		total.diffuse  += contrib.diffuse;
		total.specular += contrib.specular;

		// Look for next light table index if applicable
		if(++tableIndex < spotLightData.y)
			lightId = SpotLightTable.Load(spotLightData.x + (tableIndex * 4));
		else
			lightId = 0xffffffff;
	}
}

The light table is always organized in such a way that the light source indices will increase by the way.

Unfortunately this approach fails with the very same error message as above as well.

I still think there should be some way to accomplish this without having to break the instance draw calls apart and constantly change cbuffer data though or...?


PARTNERS