Speed - Texture Lookups and Structured Buffers

Started by
9 comments, last by Migi0027 9 years, 11 months ago

Hi guys,

so currently I'm following the deferred pipeline, storing the normals/depths/etc.. in different render targets.

Now I'm not proud of it, but each single time I render a, e.g., point light, I reconstruct the position and normals per pixel, and then do some lighting calculations, a single pass of a point light takes around 500 us.

Now for performance I though it would be better if I packed the gbuffer into a structured buffer, so I did. I construct the gbuffer from the textures in the compute shader bound as a UAV, then access it in the pixel shader for lighting as a SRV. I was expecting an increase in performance, as texture lookups are expensive, but instead rendering a point light took around 1000 us.

For the unpacked gbuffer, I have 3 float4 textures, and in the packed gbuffer, the structured buffer consists of 2 float4 and 1 float3.

Now my question is how are the speeds when comparing texture lookups and structured buffers, is it supposed to be like this? Or am I doing something wrong?

I apologize that I do not have the source code, but at the moment I do not have access to the machine where the source code is on. When I get access, I'll post it, should be within a few hours.

Thank you for your time

-MIGI0027

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Advertisement

Texture reads are expensive (relatively speaking) because the GPU has to fetch the data from off-chip memory and then wait for that memory to be available. Buffer reads have the same problem, so you're not going to avoid it by switching to buffers. When you're bottlenecked by memory access, the performance will heavility depend on your access patterns with regards to cache. In this regard textures have an advantage, because GPU's usually store textures in a "swizzled" pattern that maps the texels to hardware caches when fetched in a pixel shader. Buffers are typically stored linearly, which won't map as well to pixel shaders.

Buffer reads have the same problem, so you're going to avoid it by switching to buffers

It's probably just me, but please elaborate unsure.png .

By buffer reads do you mean just reading from a buffer or structured buffers?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

I would guess that MJP just accidentally omitted the word "not" there.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I would guess that MJP just accidentally omitted the word "not" there.

Indeed, sorry about that.

So I should go about not using buffers at all?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

So I learned that some developers manage to get a point light to render in about 0.5 us, which I find amazing.

So out of curiosity, what are your measures, of a single point light, directional or something close?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Your buffer method has about double the floats, so double the memory bandwidth of the texture method. The fact that it also ran in around double the time is an indication that your shader is bottlenecked by memory bandwidth.
Try to reduce your memory requirements as much as possible - e.g. 3x8bit normals and 16 or 24 bit depth ;)

The more recent tiled/clustered deferred renderers improve bandwidth by shading more than 1 light at a time -- i.e. They'll read the gbuffer, shade 10 lights, add them and return the sum. Thus amortizing the gbuffer read and ROP/OM costs.

So I managed to compress my gbuffer into 2 float2s, which is good, I hope. smile.png


struct GB
{
	float4 ColorXYZ_RoughW;
	float4 NormXY_PostZ_DepthW;
};

So, as millions of people say, I store the view space position depth, and then reconstruct it later on, but, there's a problem. The position isn't being reconstructed correctly, and I'm following MJP's article on the reconstruction part 3.

What I'm currently doing:


Forward GBuffer Rendering:
output.depth = length(input.positionView.xyz);

Directional Light Shading:
   Vertex Shader;;
	float3 positionWS = mul(Output.Pos, mul(viewInv, projInv)); // not the fastest, I know...
	Output.ViewRay = positionWS - cameraPosition;

   Pixel Shader;;
	float depth = gb.NormXY_PostZ_DepthW.w;
	float3 viewRay = normalize(input.ViewRay);
	float3 positionWS = cameraPosition + viewRay * depth;
	return float4(positionWS, 1); // debugging purposes

And this is the result:

wb40h2.png

So I'm not expecting a rescue mission, but maybe you could spot the issue? happy.png ( If it's that simple )

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Maybe I'm doing something wrong, as usual.

In the vertex shader I have the Output.Pos, which is the screen space pos of the full screen quad.

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

This topic is closed to new replies.

Advertisement