Jump to content

  • Log In with Google      Sign In   
  • Create Account

Compute Shaders & Lighting - Performance


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 Migi0027   Crossbones+   -  Reputation: 1726

Like
0Likes
Like

Posted 01 February 2014 - 02:45 AM

Hi guys wink.png .

 

So, I've recently tried to perform my lighting calculations in the compute shader, the visual results are as they need to be, great, but, the actual performance of the whole execution is horrible, much worse than before, when using a simple full screen quad. So, I was hoping that you guys, maybe could spot out what could go extremely wrong. 

 

The shader:

  • I tried removing everything, and only loading t_normals, and saving the encoded version of that into the uav, but, the performance is still bad for some reason. But when not loading any texture, and only saving/encoding float3(1,1,1), the performance is great, but for some reason the loading seems to cause problems...
// GBUFFER and Shadow Map and Material Information
Texture2D t_normals : register(t0);
Texture2D t_position : register(t1);
Texture2D t_diffuse : register(t2);
Texture2D t_material : register(t3);
Texture2D t_smap : register(t4);

#include "BRDF.hlsli"

cbuffer FrameBuffer : register(b0)
{
	matrix view;
	matrix projection;
	matrix texture_transform; 

	float3 cameraPosition; 
	float pad0;
};

cbuffer ObjectBuffer : register(b1)
{
	float4 lColor;

	float3 lPosition;

	float lCutoff;
	float lRadius;
	float lIntensity;

	float2 pad1;
};

RWTexture2D<uint> tOutputRW : register(u0);
SamplerState ss;

uint EncodeColor(in float3 color)
{	
	int3 iColor = int3(color*255.0f);
		uint colorMask = (iColor.r<<16) | (iColor.g<<8) | iColor.b;
	return colorMask;
}

// Decode specified mask into a float3 color (range 0.0f-1.0f).
float3 DecodeColor(in uint colorMask)
{
	float3 color;
	color.r = (colorMask>>16) & 0x000000ff;
	color.g = (colorMask>>8) & 0x000000ff;
	color.b = colorMask & 0x000000ff;
	color /= 255.0f;
	return color;
}

[numthreads(1, 1, 1)]
void CShader( uint3 DTid : SV_DispatchThreadID )
{
	// Get Normals and Prelit Factor
	float4 normal = t_normals[DTid.xy];
	float prelit = normal.a;

	//Get Position
	float4 position = t_position[DTid.xy];

	// View Space -> World Space
        position = mul(float4(position.xyz, 1), texture_transform);
        normal = mul(float4(normal.xyz, 0), texture_transform);

	// Get Diffuse
	float4 diffuse = t_diffuse[DTid.xy];

	// Get Material
	float4 material = t_material[DTid.xy];
	
	float mlit = 1.0f-prelit;
	float3 col = BRDFPointLight(normal.xyz, lColor, diffuse.xyz, material.xyz, material.a, position.xyz, lPosition.xyz, lRadius, cameraPosition)*lIntensity;

	// Special Post Processing Flag
	col = lerp(col, diffuse, mlit);

	// Buffer
	tOutputRW[DTid.xy] = EncodeColor(DecodeColor(tOutputRW[DTid.xy]) + col.xyz);
}

So all of these point lights are additively added to a buffer, then later on I decode it into a simple Texture2D with the DecodeColor(...), and the performance of that one is good ( Intel GPA ), so it shouldn't be a problem.

 

As said, the result is fine, but the performance is NOT good,  it chops 60fps to 25fps ( I'm unable to get the micro/milliseconds as Intel GPA does not want to capture the time taken for my compute shader, though I can find it, the time taken is 0, which isn't true ).

 

2wmi6xj.png

 

Any ideas what could go wrong?

 

And you've reached the bottom, thanks for your time!

-MIGI0027


Edited by Migi0027, 01 February 2014 - 02:47 AM.

Hi! Cuboid Zone
The Rule: Be polite, be professional, but have a plan to kill everyone you meet, ohh, AND STEAL ALL ZE TRIANGLES FROM ZHEM!

Sponsor:

#2 phantom   Moderators   -  Reputation: 7274

Like
1Likes
Like

Posted 01 February 2014 - 03:38 AM

"[numthreads(1, 1, 1)]" is likely to be your problem; you are telling the GPU to dispatch a thread group with only 1 active thread in it, which means on most GPUs you are idling 31 (NV) to 64 (AMD) threads per groups or most of the ALU power.

The number of threads dispatched here wants to be a multiple of 32 or 64, depending on target hardware, and then your overall thread group dispatch count needs to be adjusted to account for this.

#3 Migi0027   Crossbones+   -  Reputation: 1726

Like
0Likes
Like

Posted 01 February 2014 - 03:52 AM

Thank you! biggrin.png

 

The performance is great, and the result is still the same, all that I need.

 

-MIGI0027


Hi! Cuboid Zone
The Rule: Be polite, be professional, but have a plan to kill everyone you meet, ohh, AND STEAL ALL ZE TRIANGLES FROM ZHEM!




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS