Speed - Texture Lookups and Structured Buffers

Graphics and GPU Programming Programming

Started by Migi0027 April 18, 2014 05:17 PM

9 comments, last by Migi0027 9 years, 12 months ago

Migi0027

4,632

Author

April 18, 2014 05:17 PM

Hi guys,

so currently I'm following the deferred pipeline, storing the normals/depths/etc.. in different render targets.

Now I'm not proud of it, but each single time I render a, e.g., point light, I reconstruct the position and normals per pixel, and then do some lighting calculations, a single pass of a point light takes around 500 us.

Now for performance I though it would be better if I packed the gbuffer into a structured buffer, so I did. I construct the gbuffer from the textures in the compute shader bound as a UAV, then access it in the pixel shader for lighting as a SRV. I was expecting an increase in performance, as texture lookups are expensive, but instead rendering a point light took around 1000 us.

For the unpacked gbuffer, I have 3 float4 textures, and in the packed gbuffer, the structured buffer consists of 2 float4 and 1 float3.

Now my question is how are the speeds when comparing texture lookups and structured buffers, is it supposed to be like this? Or am I doing something wrong?

I apologize that I do not have the source code, but at the moment I do not have access to the machine where the source code is on. When I get access, I'll post it, should be within a few hours.

Thank you for your time

-MIGI0027

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

MJP

20,295

April 18, 2014 05:48 PM

Texture reads are expensive (relatively speaking) because the GPU has to fetch the data from off-chip memory and then wait for that memory to be available. Buffer reads have the same problem, so you're not going to avoid it by switching to buffers. When you're bottlenecked by memory access, the performance will heavility depend on your access patterns with regards to cache. In this regard textures have an advantage, because GPU's usually store textures in a "swizzled" pattern that maps the texels to hardware caches when fetched in a pixel shader. Buffers are typically stored linearly, which won't map as well to pixel shaders.

The Blog | The Book

Migi0027

4,632

Author

April 18, 2014 06:21 PM

Buffer reads have the same problem, so you're going to avoid it by switching to buffers

It's probably just me, but please elaborate .

By buffer reads do you mean just reading from a buffer or structured buffers?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

21st Century Moose

13,459

April 18, 2014 06:42 PM

I would guess that MJP just accidentally omitted the word "not" there.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

MJP

20,295

April 18, 2014 06:49 PM

I would guess that MJP just accidentally omitted the word "not" there.

Indeed, sorry about that.

The Blog | The Book

Migi0027

4,632

Author

April 18, 2014 07:12 PM

So I should go about not using buffers at all?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Migi0027

4,632

Author

April 18, 2014 09:47 PM

So I learned that some developers manage to get a point light to render in about 0.5 us, which I find amazing.

So out of curiosity, what are your measures, of a single point light, directional or something close?

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

Hodgman

52,717

April 18, 2014 10:15 PM

Your buffer method has about double the floats, so double the memory bandwidth of the texture method. The fact that it also ran in around double the time is an indication that your shader is bottlenecked by memory bandwidth.
Try to reduce your memory requirements as much as possible - e.g. 3x8bit normals and 16 or 24 bit depth ;)

The more recent tiled/clustered deferred renderers improve bandwidth by shading more than 1 light at a time -- i.e. They'll read the gbuffer, shade 10 lights, add them and return the sum. Thus amortizing the gbuffer read and ROP/OM costs.

. 22 Racing Series .

Migi0027

4,632

Author

April 19, 2014 10:51 PM

So I managed to compress my gbuffer into 2 float2s, which is good, I hope.


struct GB
{
	float4 ColorXYZ_RoughW;
	float4 NormXY_PostZ_DepthW;
};

So, as millions of people say, I store the view space position depth, and then reconstruct it later on, but, there's a problem. The position isn't being reconstructed correctly, and I'm following MJP's article on the reconstruction part 3.

What I'm currently doing:


Forward GBuffer Rendering:
output.depth = length(input.positionView.xyz);

Directional Light Shading:
   Vertex Shader;;
	float3 positionWS = mul(Output.Pos, mul(viewInv, projInv)); // not the fastest, I know...
	Output.ViewRay = positionWS - cameraPosition;

   Pixel Shader;;
	float depth = gb.NormXY_PostZ_DepthW.w;
	float3 viewRay = normalize(input.ViewRay);
	float3 positionWS = cameraPosition + viewRay * depth;
	return float4(positionWS, 1); // debugging purposes

And this is the result: