Jump to content

  • Log In with Google      Sign In   
  • Create Account


Depth pre pass worth it ?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 lipsryme   Members   -  Reputation: 979

Like
0Likes
Like

Posted 03 April 2013 - 11:02 AM

I'm currently building my new deferred shading based renderer and been testing depth pre pass for opaque geometry.

 

I haven't implemented the GBuffer pass yet (rendering to backbuffer atm) so I guess the benefit will be greater later on but I was testing instancing with this and for 50k textured cubes (full input = Pos/UV/Normal/Tangent/BiTangent) and anisotropic filtering on. Here are my results:

 

Without depth pre pass:

actual rasterized geometry pass = ~ 2.7ms

 

With depth pre pass:

pre pass = ~ 2.0ms

actual rasterized geometry pass: around 2.4ms

 

So basically it's cutting my performance almost in half.

I'm already doing a very lightweight pre pass.

I've split the vertex information so I can transfer only the vertex position data.

This means that I'm setting multiple vertex buffers for the actual rasterization pass unfortunately though...could this be the culprit ?

I've set the pixel shader to NULL, set color write disabled.

 

And the shader itself only transforms vertex positions:

#pragma pack_matrix( row_major )



// Single big buffer to store instance transforms
Buffer<float4> InstanceTransformBuffer : register(t0);


// Constant buffers
cbuffer InstanceTransformsAccessBuffer : register(b0)
{
	float startIndex : packoffset(c0.x);
	float elementsPerInstance : packoffset(c0.y);

	float4x4 ViewProjection : packoffset(c1);
};


struct VSI
{
	float4 Position		: POSITION;
	uint InstanceID		: SV_InstanceID;
};


struct VSO
{
	float4 Position : SV_POSITION;
};




float4x4 GetInstanceTransform(uint instID, uint offset)
{
	uint BufferOffset = instID * elementsPerInstance + startIndex + offset;

	float4 c0 = InstanceTransformBuffer.Load(BufferOffset + 0);
	float4 c1 = InstanceTransformBuffer.Load(BufferOffset + 1);
	float4 c2 = InstanceTransformBuffer.Load(BufferOffset + 2);
	float4 c3 = float4(0.0f, 0.0f, 0.0f, 1.0f);

	float4x4 _World = { c0.x, c1.x, c2.x, c3.x,
						c0.y, c1.y, c2.y, c3.y,
						c0.z, c1.z, c2.z, c3.z,
						c0.w, c1.w, c2.w, c3.w };

	return _World;
}





VSO VS(VSI input)
{
	VSO output = (VSO)0;

	float4x4 World = GetInstanceTransform(input.InstanceID, 0);
	float4x4 WVP = mul(World, ViewProjection);
	output.Position = mul(input.Position, WVP);

	return output;
}

 

My render function looks like this:

void RendererD3D11::RenderGBuffer(const unsigned int drawcalls,
								  const unsigned int* culledSceneIDs)
{
	// Get instance description
	InstanceGroupDescription* instanceGroup = this->contentManager->GetPtrToOpaqueInstanceGroupDesc(drawcalls);

	unsigned int numInstances = 0; // Keeps track of how many instances we actually want to draw of this group
	if(instanceGroup->entityType == SceneList::Primitive)
	{
		D3D11_MAPPED_SUBRESOURCE instanceBufferProperties;

		// Lock the constant buffer so it can be written to
		this->deviceContext->Map(this->instanceTransformBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &instanceBufferProperties);

		// Get a pointer to the data in the constant buffer.
		XMFLOAT4* pInstanceData = (XMFLOAT4*)instanceBufferProperties.pData;


		// Go through each sceneInstance inside this instance group
		for(size_t i = 0; i < instanceGroup->instanceSceneIDSize; ++i)
		{
			if(culledSceneIDs[instanceGroup->instanceSceneIDs[i]] != 0)
			{
				// Get ScenePrimitiveDescription
				ScenePrimitiveDescription* scenePrimitive = &this->sceneManager->GetCurrentScene()->GetDesc()->primitives[instanceGroup->instanceSceneIDs[i]];

				// Update buffer
				XMMATRIX worldTransform = XMMatrixTranspose(XMLoadFloat4x4(&scenePrimitive->worldTransform));			
				for(int u = 0; u < 3; u++)
				{	
					XMStoreFloat4(&pInstanceData[(numInstances * 3) + u], worldTransform.r[u]);
				}

				// This instance should be drawn since it was not culled.
				numInstances++;	
			}
		}

		// Unlock the constant buffer
		this->deviceContext->Unmap(this->instanceTransformBuffer, 0);

		D3D11_MAPPED_SUBRESOURCE mappedResourceProperties;

		// Lock the constant buffer so it can be written to
		this->deviceContext->Map(this->cbInstanceTransformAccessBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResourceProperties);

		// Get a pointer to the data in the constant buffer.
		cbsAccessInstanceTransforms* pData = (cbsAccessInstanceTransforms*)mappedResourceProperties.pData;

		// Copy the matrices into the constant buffer
		XMStoreFloat4x4(&pData->ViewProjection, mainCamera->ViewProjectionMatrix());

		pData->elementsPerInstance = 3;
		pData->startIndex = 0;
		pData->padding = XMFLOAT2(0.0f, 0.0f);


		// Unlock the constant buffer
		this->deviceContext->Unmap(this->cbInstanceTransformAccessBuffer, 0);

		// Set Constant buffer
		this->deviceContext->VSSetConstantBuffers(0, 1, &this->cbInstanceTransformAccessBuffer);

		// Set Shader Resource View for instance transforms
		ID3D11ShaderResourceView* resource = this->instanceTransformBuffer_SRV;
		this->deviceContext->VSSetShaderResources(0, 1, &resource);

		// Set BlendState
		this->renderStateContext.SetBlendState(RenderStateDesc::ColorWriteDisabled, &blendStates, deviceContext);

		// Set DepthStencilState
		this->renderStateContext.SetDepthStencilState(RenderStateDesc::DepthWriteEnabled, &this->depthStencilStates, this->deviceContext);

		// Depth Pre-Pass
		GenericShader* depthPrePass_shader = this->contentManager->GetShader(ShaderFile::DepthPrePass, this->device);
		if(depthPrePass_shader)
		{
			this->deviceContext->VSSetShader(depthPrePass_shader->VS, 0, 0);
			this->deviceContext->PSSetShader(NULL, 0, 0);
		}

		ID3D11DepthStencilView* main_DSV = this->mainDSV;
		deviceContext->OMSetRenderTargets(0, NULL, main_DSV);

		// Instanced Draw Call (DepthPrePass)
		if(numInstances > 0)
		{
			this->contentManager->GetPrimitiveFromPool(instanceGroup->groupID)->Draw(this->depthOnlyInputLayout, this->device,
																					 this->deviceContext, numInstances, true);
		}

		// Get ScenePrimitiveDescription
		ScenePrimitiveDescription* scenePrimitive = &this->sceneManager->GetCurrentScene()->GetDesc()->primitives[instanceGroup->instanceSceneIDs[0]];

		// Set DiffuseMap
		ID3D11ShaderResourceView* diffuseMap_SRV = this->contentManager->GetTextureFromPool(scenePrimitive->material.diffuseMap.ID)->GetResource();
		this->deviceContext->PSSetShaderResources(1, 1, &diffuseMap_SRV);

		// Set NormalMap
		ID3D11ShaderResourceView* normalMap_SRV = this->contentManager->GetTextureFromPool(scenePrimitive->material.normalMap.ID)->GetResource();
		this->deviceContext->PSSetShaderResources(2, 1, &normalMap_SRV);

		// Set SpecularMap
		ID3D11ShaderResourceView* specularMap_SRV = this->contentManager->GetTextureFromPool(scenePrimitive->material.specularMap.ID)->GetResource();
		this->deviceContext->PSSetShaderResources(3, 1, &specularMap_SRV);

		// Set SamplerStates
		this->renderStateContext.SetSamplerState(RenderStateDesc::Anisotropic, &this->samplerstates, this->deviceContext, 0);
		this->renderStateContext.SetSamplerState(RenderStateDesc::Linear, &this->samplerstates, this->deviceContext, 1);

		// Set BlendState
		this->renderStateContext.SetBlendState(RenderStateDesc::BlendDisabled, &blendStates, deviceContext);

		// Set DepthStencilState
		this->renderStateContext.SetDepthStencilState(RenderStateDesc::DepthEnabled, &this->depthStencilStates, this->deviceContext);

		// Set FillMode
		if(this->isWireframe)
		{
			this->renderStateContext.SetRasterizerState(RenderStateDesc::Wireframe, &rasterizerStates, deviceContext);
		}
		else
		{
			this->renderStateContext.SetRasterizerState(RenderStateDesc::BackFaceCull, &rasterizerStates, deviceContext);
		}

		// Set GBuffer shader
		GenericShader* gbuffer_shader = this->contentManager->GetShader(ShaderFile::GBuffer, this->device);
		if(gbuffer_shader)
		{
			this->deviceContext->VSSetShader(gbuffer_shader->VS, 0, 0);
			this->deviceContext->PSSetShader(gbuffer_shader->PS, 0, 0);
		}

		ID3D11RenderTargetView* backBuffer_RTV = this->backBufferRTV;
		deviceContext->OMSetRenderTargets(1, &backBuffer_RTV, main_DSV);

		// Instanced Draw Call (GBuffer)
		if(numInstances > 0)
		{
			this->contentManager->GetPrimitiveFromPool(instanceGroup->groupID)->Draw(this->defaultInputLayout, this->device,
																					 this->deviceContext, numInstances, false);
		}
	}
	else
	{
		// StaticMesh in here...
	}

}

Edited by lipsryme, 03 April 2013 - 11:19 AM.


Sponsor:

#2 AllEightUp   Moderators   -  Reputation: 4064

Like
1Likes
Like

Posted 03 April 2013 - 11:34 AM

As with anything graphics related this is usually going to be a hit and miss subject.  If you have a lot of overdraw and complex shaders this is where depth z-pass tends to give you the best results.  But if your scenes are fairly well culled, have low overdraw and such, it "can" be a loss of performance because you are eating up memory bandwidth.  You are typically fairly safe to keep it in there as bandwidth is not normally a problem on modern cards, well excepting the mobile variations usually.  I'd actually hook it up as a switch if at all possible and just test later.

 

As to the buffer sends, that can of course be a culprit.  The specific Item I remember, from doing this a while back was the very notable performance gain we achieved by breaking the positional portion away from the other bits using multiple streams.  Looks like you have that so can't really say much other than the above generalization.


Edited by AllEightUp, 03 April 2013 - 11:40 AM.


#3 lipsryme   Members   -  Reputation: 979

Like
0Likes
Like

Posted 03 April 2013 - 11:52 AM

Alright I'm gonna keep it and from time to time test if it's worth it.

 

By the way is it necessary to set the RenderTarget to NULL if you have no pixel shader set and color write disabled ?



#4 AllEightUp   Moderators   -  Reputation: 4064

Like
1Likes
Like

Posted 03 April 2013 - 12:13 PM

Alright I'm gonna keep it and from time to time test if it's worth it.

That's always been my approach, don't throw things out till proven with "real" content if they should go away.

 

By the way is it necessary to set the RenderTarget to NULL if you have no pixel shader set and color write disabled ?

I wish I could tell you but this is getting into the specifics of which i just don't remember much of.  Sorry, hopefully some DX guru will pass by and drop a dollop of knowledge in this area. :)



#5 MJP   Moderators   -  Reputation: 10025

Like
7Likes
Like

Posted 03 April 2013 - 05:19 PM

A great graphics programmer once said "a z-prepass is a day-to-day decision, not a lifestyle choice". I would recommend making it easy to turn on and off, and constantly profile to see if it's worth it for your current scene/shaders/renderer configuration/resolution/etc.



#6 mhagain   Crossbones+   -  Reputation: 7328

Like
1Likes
Like

Posted 03 April 2013 - 06:08 PM

A depth pre-pass can make sense in cases where you've got potentially lots of overdraw and where you can't get reasonable front-to-back sorting for your opaque geometry; in other words where the overhead of not doing it is greater than the overhead of doing it.  It's definitely not a general-case solution.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#7 AliasBinman   Members   -  Reputation: 411

Like
3Likes
Like

Posted 03 April 2013 - 08:17 PM

If you have some memory to spare then its worth doing a position only VB and IB for each mesh. As well as this just storing the position component of the original mesh it can usually contain a lot less vertices.

Think of the case of cubes with hard faces. This requires 24 vertices total (6 faces * 4 vertices). However the position only mesh just requires 8 vertices. This cuts down on the bandwidth for vertex fetching, needs less transforms and makes better use of the post transform cache. 

 

By using the position only VB and IB the original mesh can then be a standard interleaved format.

 

For the above don't forget to optimise for post-TC and pre-TC, use a position only decl and VS that only transforms position.

 

Secondly when doing you're frustum culling pass then use a different frustum for the prepass with a much closer far plane so the accepted number of meshes drawn is much lower. There is little gain to drawing meshes far away, firstly they are likely to cover little pixels on the screen, secondly they are less likely to occlude many pixels and thirdly HiZ buffers tend to really lose precision at far distances. It not uncommon to set the far plane to be as close as say 150m.

You can also cull meshes from the prepass using some heuristics. For example Don;t even bother considering meshes which are unlikely to cover many screen pixels.

 

I usually find it a gain to not draw alpha-tested objects in the prepass but to ensure they get drawn first in the base pass. That way they will benefit from opaque objects in the pre-pass and then update the depth buffer before the opaque objects in the base pass.



#8 mhagain   Crossbones+   -  Reputation: 7328

Like
1Likes
Like

Posted 04 April 2013 - 06:53 AM

If you have some memory to spare then its worth doing a position only VB and IB for each mesh. As well as this just storing the position component of the original mesh it can usually contain a lot less vertices.

Think of the case of cubes with hard faces. This requires 24 vertices total (6 faces * 4 vertices). However the position only mesh just requires 8 vertices. This cuts down on the bandwidth for vertex fetching, needs less transforms and makes better use of the post transform cache. 

 

By using the position only VB and IB the original mesh can then be a standard interleaved format.

 

For the above don't forget to optimise for post-TC and pre-TC, use a position only decl and VS that only transforms position.

 

Secondly when doing you're frustum culling pass then use a different frustum for the prepass with a much closer far plane so the accepted number of meshes drawn is much lower. There is little gain to drawing meshes far away, firstly they are likely to cover little pixels on the screen, secondly they are less likely to occlude many pixels and thirdly HiZ buffers tend to really lose precision at far distances. It not uncommon to set the far plane to be as close as say 150m.

You can also cull meshes from the prepass using some heuristics. For example Don;t even bother considering meshes which are unlikely to cover many screen pixels.

 

I usually find it a gain to not draw alpha-tested objects in the prepass but to ensure they get drawn first in the base pass. That way they will benefit from opaque objects in the pre-pass and then update the depth buffer before the opaque objects in the base pass.

 

I'll add to this - in your depth pre-pass, do not under any circumstances output depth from your fragment/pixel shader.  Output any arbitrary colour, and use glColorMask/D3DRS_COLORWRITEENABLE/ID3D10BlendState/ID3D11BlendState to control whether or not you write to the color buffer instead.  Beware of APIs where the color write mask affects whether or not the color buffer is cleared.  If you want everything to start off black in order to accumulate light it may be more efficient to clear to black and disable color writes than it is to enable color writes and output black from your shader.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#9 MJP   Moderators   -  Reputation: 10025

Like
1Likes
Like

Posted 04 April 2013 - 02:35 PM

For a depth prepass you shouldn't even be using a pixel/fragment shader at all unless you're rendering something that's alpha-tested.



#10 Krohm   Crossbones+   -  Reputation: 2913

Like
0Likes
Like

Posted 06 April 2013 - 02:41 AM

I've split the vertex information so I can transfer only the vertex position data.
This means that I'm setting multiple vertex buffers for the actual rasterization pass unfortunately though...could this be the culprit ?

My tests with z-only pass are old, relating to D3D9 hardware.The main problem for me was the draw calls. On that hardware, using as many drawcalls for z as I used for standard rendering was nonsense. I'm surprised however this is still the case: it seemed like D3D10+ was going to be more efficient at dipatching drawcalls.

 

Are you doing some kind of batch merging?



#11 lipsryme   Members   -  Reputation: 979

Like
1Likes
Like

Posted 06 April 2013 - 02:54 AM

I'm assuming every opaque object to be instanced. So yes :)

#12 Krohm   Crossbones+   -  Reputation: 2913

Like
0Likes
Like

Posted 06 April 2013 - 02:59 AM

Ouch, that's bad news for me... thank you for the quick reply!



#13 lipsryme   Members   -  Reputation: 979

Like
0Likes
Like

Posted 06 April 2013 - 03:07 AM

I'm still in the process of figuring out how to handle different materials though...

#14 Krohm   Crossbones+   -  Reputation: 2913

Like
0Likes
Like

Posted 06 April 2013 - 03:26 AM

I also spent some time trying to do that. I don't suggest anyone to go that way, it's not really funny and being >>200fps I suggest to take a break, think at what to do next and then tackle a different problem where your effort can be spent more efficiently.

 

Anyway, I had this system which tried to pull out the depth-relevant shader from the standard material shaders. That way, most materials could be coalesced in a single super-material using a single batch. It always felt very, very brittle. If you still want to go that way, maybe you could have a special annotation for each used material to instruct the renderer to use a different shader when z-passing.

Will it be worth the effort? I doubt so, but if you're inclined to experimentation, have a try.

To be honest, I'm interested in knowing what you find out if you could post some results in the future.



#15 bombshell93   Members   -  Reputation: 198

Like
0Likes
Like

Posted 06 April 2013 - 08:38 AM

Materials with deferred?

how about an 8/16-bit(depending on your needs) MaterialID Buffer (to make room you can throw out Diffuse and Specular Buffer and replace then with UV's)
divide the screen into a 16x16 grid (or whatever size, fiddling with it you'd probably find a better resolution) of Partial Screen Quads,
in whatever way you see fit for each material fill a list of what cells contain it and fill each materials instance buffer.
for each shader, for each material that uses shader, set shader parameters from material and draw Partial Screen Quad by the materials instance buffer (I'm wording this like crap but I hope you understand)

in the shader which as well as the materials parameters has been passed the materials ID, if the pixel is not the current material return junk.
if the material is the right one, get the diffuse and specular via the texture the material passed and the UV buffer, normals you should have in a buffer so you dont have to transform the normals out of tangent space every time.
the rest should explain itself.

I've probably spoken gibberish or gotten something horribly broken, I've not tested this, it pretty much came to me on the spot.



#16 lipsryme   Members   -  Reputation: 979

Like
0Likes
Like

Posted 06 April 2013 - 08:56 AM

I didn't really get that...

Well getting material info to the Gbuffer could be easily done using a Buffer<float4> like I did with the transforms for each instance, the problem is rather how to get different textures on each instance. I read about Texture2DArrays but I'm not exactly sure on how I'd create those on the fly(I don't want to fill it with possibly 100k+ of the same texture) , or even if that was a valid option (performance / every texture has to be the same size?)


Edited by lipsryme, 06 April 2013 - 09:03 AM.


#17 bombshell93   Members   -  Reputation: 198

Like
0Likes
Like

Posted 06 April 2013 - 09:08 AM

well I imagine each model would have a limited number of textures for it so as long as their not 1024x or bigger you could merge them into a big texture and have a UV offset as part of the instance, but this is limited by the texture size and count.
texture arrays I've had no experience with but this might be a good starting point,
http://www.rastertek.com/dx10tut17.html

 

I used his tutorials a while ago and they were fairly easy to follow, I'll og out on a limb and say this probably follows the trend, as for how useful they are, I've heard of them but again, I've never used them myself so I dont quite know.

And to put what I initially said in a nutshell, replace Diffuse and Specular Buffers with a MaterialID / UV Buffer and light the screen per material sampling the diffuse and specular from textures passed in by the material... that pretty much sums it up in a much less ballsed up way.



#18 french_hustler   Members   -  Reputation: 339

Like
0Likes
Like

Posted 08 April 2013 - 11:52 AM

I'm assuming every opaque object to be instanced. So yes smile.png

 

I don't have inputs on your questions but I'd like to know why you assume all opaque objects to be instances?

I would assume that there is some performance hits if there are multiple unique objects each being drawn as a single instance instead of drawing them the "regular" way.

 

Could you please expand on this?

Thanks.



#19 lipsryme   Members   -  Reputation: 979

Like
0Likes
Like

Posted 08 April 2013 - 12:21 PM

You know it's funny I could have sworn to have read about that in some frostbite paper but I get the feeling I miss read something...

Good thing you pointed that out. Is there a drawback to an InstancedDrawCall using 1 instance though ?


Edited by lipsryme, 08 April 2013 - 12:22 PM.


#20 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 12 April 2013 - 07:56 AM


struct VSI
{
	float4 Position		: POSITION;
	uint InstanceID		: SV_InstanceID;
};

Wasn't it possible to just get the instance ID in the vertex shader as a second argument, instead of keeping it in the VSI struct?






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS