Sign in to follow this  
Migi0027

Speed - Texture Lookups and Structured Buffers

Recommended Posts

Migi0027    4628

Hi guys,

 

so currently I'm following the deferred pipeline, storing the normals/depths/etc.. in different render targets.

 

Now I'm not proud of it, but each single time I render a, e.g., point light, I reconstruct the position and normals per pixel, and then do some lighting calculations, a single pass of a point light takes around 500 us.

 

Now for performance I though it would be better if I packed the gbuffer into a structured buffer, so I did. I construct the gbuffer from the textures in the compute shader bound as a UAV, then access it in the pixel shader for lighting as a SRV. I was expecting an increase in performance, as texture lookups are expensive, but instead rendering a point light took around 1000 us.

 

For the unpacked gbuffer, I have 3 float4 textures, and in the packed gbuffer, the structured buffer consists of 2 float4 and 1 float3.

 

Now my question is how are the speeds when comparing texture lookups and structured buffers, is it supposed to be like this? Or am I doing something wrong?

 

I apologize that I do not have the source code, but at the moment I do not have access to the machine where the source code is on. When I get access, I'll post it, should be within a few hours.

 

Thank you for your time

-MIGI0027

Share this post


Link to post
Share on other sites
MJP    19755

Texture reads are expensive (relatively speaking) because the GPU has to fetch the data from off-chip memory and then wait for that memory to be available. Buffer reads have the same problem, so you're not going to avoid it by switching to buffers. When you're bottlenecked by memory access, the performance will heavility depend on your access patterns with regards to cache. In this regard textures have an advantage, because GPU's usually store textures in a "swizzled" pattern that maps the texels to hardware caches when fetched in a pixel shader. Buffers are typically stored linearly, which won't map as well to pixel shaders.

Edited by MJP

Share this post


Link to post
Share on other sites
Migi0027    4628

Buffer reads have the same problem, so you're going to avoid it by switching to buffers

 

It's probably just me, but please elaborate unsure.png .

 

By buffer reads do you mean just reading from a buffer or structured buffers?

Edited by Migi0027

Share this post


Link to post
Share on other sites
Migi0027    4628

So I learned that some developers manage to get a point light to render in about 0.5 us, which I find amazing.

 

So out of curiosity, what are your measures, of a single point light, directional or something close?

Share this post


Link to post
Share on other sites
Hodgman    51234
Your buffer method has about double the floats, so double the memory bandwidth of the texture method. The fact that it also ran in around double the time is an indication that your shader is bottlenecked by memory bandwidth.
Try to reduce your memory requirements as much as possible - e.g. 3x8bit normals and 16 or 24 bit depth ;)

The more recent tiled/clustered deferred renderers improve bandwidth by shading more than 1 light at a time -- i.e. They'll read the gbuffer, shade 10 lights, add them and return the sum. Thus amortizing the gbuffer read and ROP/OM costs.

Share this post


Link to post
Share on other sites
Migi0027    4628

So I managed to compress my gbuffer into 2 float2s, which is good, I hope. smile.png

struct GB
{
	float4 ColorXYZ_RoughW;
	float4 NormXY_PostZ_DepthW;
};

So, as millions of people say, I store the view space position depth, and then reconstruct it later on, but, there's a problem. The position isn't being reconstructed correctly, and I'm following MJP's article on the reconstruction part 3.

 

What I'm currently doing:

Forward GBuffer Rendering:
output.depth = length(input.positionView.xyz);

Directional Light Shading:
   Vertex Shader;;
	float3 positionWS = mul(Output.Pos, mul(viewInv, projInv)); // not the fastest, I know...
	Output.ViewRay = positionWS - cameraPosition;

   Pixel Shader;;
	float depth = gb.NormXY_PostZ_DepthW.w;
	float3 viewRay = normalize(input.ViewRay);
	float3 positionWS = cameraPosition + viewRay * depth;
	return float4(positionWS, 1); // debugging purposes

And this is the result:

 

wb40h2.png

 

So I'm not expecting a rescue mission, but maybe you could spot the issue? happy.png  ( If it's that simple )

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this