# Compute Shaders & Lighting - Performance

This topic is 1811 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi guys  .

So, I've recently tried to perform my lighting calculations in the compute shader, the visual results are as they need to be, great, but, the actual performance of the whole execution is horrible, much worse than before, when using a simple full screen quad. So, I was hoping that you guys, maybe could spot out what could go extremely wrong.

• I tried removing everything, and only loading t_normals, and saving the encoded version of that into the uav, but, the performance is still bad for some reason. But when not loading any texture, and only saving/encoding float3(1,1,1), the performance is great, but for some reason the loading seems to cause problems...
// GBUFFER and Shadow Map and Material Information
Texture2D t_normals : register(t0);
Texture2D t_position : register(t1);
Texture2D t_diffuse : register(t2);
Texture2D t_material : register(t3);
Texture2D t_smap : register(t4);

#include "BRDF.hlsli"

cbuffer FrameBuffer : register(b0)
{
matrix view;
matrix projection;
matrix texture_transform;

float3 cameraPosition;
};

cbuffer ObjectBuffer : register(b1)
{
float4 lColor;

float3 lPosition;

float lCutoff;
float lIntensity;

};

RWTexture2D<uint> tOutputRW : register(u0);
SamplerState ss;

uint EncodeColor(in float3 color)
{
int3 iColor = int3(color*255.0f);
uint colorMask = (iColor.r<<16) | (iColor.g<<8) | iColor.b;
}

// Decode specified mask into a float3 color (range 0.0f-1.0f).
{
float3 color;
color /= 255.0f;
return color;
}

{
// Get Normals and Prelit Factor
float4 normal = t_normals[DTid.xy];
float prelit = normal.a;

//Get Position
float4 position = t_position[DTid.xy];

// View Space -> World Space
position = mul(float4(position.xyz, 1), texture_transform);
normal = mul(float4(normal.xyz, 0), texture_transform);

// Get Diffuse
float4 diffuse = t_diffuse[DTid.xy];

// Get Material
float4 material = t_material[DTid.xy];

float mlit = 1.0f-prelit;
float3 col = BRDFPointLight(normal.xyz, lColor, diffuse.xyz, material.xyz, material.a, position.xyz, lPosition.xyz, lRadius, cameraPosition)*lIntensity;

// Special Post Processing Flag
col = lerp(col, diffuse, mlit);

// Buffer
tOutputRW[DTid.xy] = EncodeColor(DecodeColor(tOutputRW[DTid.xy]) + col.xyz);
}


So all of these point lights are additively added to a buffer, then later on I decode it into a simple Texture2D with the DecodeColor(...), and the performance of that one is good ( Intel GPA ), so it shouldn't be a problem.

As said, the result is fine, but the performance is NOT good,  it chops 60fps to 25fps ( I'm unable to get the micro/milliseconds as Intel GPA does not want to capture the time taken for my compute shader, though I can find it, the time taken is 0, which isn't true ).

Any ideas what could go wrong?

And you've reached the bottom, thanks for your time!

-MIGI0027

Edited by Migi0027

##### Share on other sites
"[numthreads(1, 1, 1)]" is likely to be your problem; you are telling the GPU to dispatch a thread group with only 1 active thread in it, which means on most GPUs you are idling 31 (NV) to 64 (AMD) threads per groups or most of the ALU power.

The number of threads dispatched here wants to be a multiple of 32 or 64, depending on target hardware, and then your overall thread group dispatch count needs to be adjusted to account for this.

##### Share on other sites

Thank you!

The performance is great, and the result is still the same, all that I need.

-MIGI0027

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 13
• 9
• 15
• 14
• 46
• ### Forum Statistics

• Total Topics
634067
• Total Posts
3015325
×