Sign in to follow this  
FantasyVII

Shader Model 2.0 runs faster than 3.0 !

Recommended Posts

FantasyVII    1073

Hello everyone,

 

So I been making a pixel shader lighting system in XNA with SM 3.0. I was getting around 70FPS with 4 lights. However when I changed the SM version in the fx file from 3.0 to 2.0, I got around 120FPS !!! Why?

 

obviously I can't use SM 2.0 because it's very limited with the amount of arithmetic that I can do. But why is it that SM 2.0 faster than 3.0? You would think 3.0 would run much much faster !!

 

Also in SM 3.0 I can have an array size of 15 elements. I can use all 15 lights and I get 70FPS. If i use 1 light I get 70FPS so it doesn't really matter. However when I set the array to 16 elements in the fx file my FPS drops to 10FPS !! even if I only use 1 light. why is that?!

 

struct Light
{
	float2 Position;
	float4 Color;
	float Radius;
	float Intensity;
};

Light lights[15];

 

 

Share this post


Link to post
Share on other sites
mhagain    13430

It's possible that the HLSL compiler is optimizing some dynamic branching better in the SM2 shader; SM2 didn't really support any dynamic branching at all and would emulate it via built-in instructions which could well end up being faster than doing real dynamic branching.  You'd have to post your full shader, as well as the code you're using to compile it, in order to make a proper analysis of this.

 

Regarding your FPS drop, this would be caused by overflowing a limit on the number of pixel shader constants you can reasonably use.  Pixel shader constants were really weird in older shader models, and sometimes involved crazy stuff like the driver having to hot-patch shader code.  Even if you're only using one light, I bet you have a loop running through a "numLights" constant (or similar) and possibly an "if" or two inside that loop, so the shader itself doesn't know how many you have at compile time, nor does it know what code path any given light is going to take, both of which can affect performance.

 

For both of these you may find it beneficial to procedurally build 16 separate shaders and select the appropriate one depending on how many lights you have, as well as checking your shader compile flags, experimenting with D3DXSHADER_AVOID_FLOW_CONTROL and D3DXSHADER_PREFER_FLOW_CONTROL.

Share this post


Link to post
Share on other sites
FantasyVII    1073

Here is my shader code. Sorry I did not post it before.

struct Light
{
	float2 Position;
	float4 Color;
	float Radius;
	float Intensity;
};

float4x4 MatrixTransform;
texture ColorTexture;

int numberOfLights;
Light lights[15];

float AmbientIntensity;
float4 AmbientColor;

float ScreenWidth;
float ScreenHeight;

sampler ColorMap = sampler_state
{
	Texture = <ColorTexture>;
};

float4 CalculateLight(Light light, float4 Base, float3 PixelPosition)
{
	float2 Direction = light.Position - PixelPosition;
	float Distance = saturate(1 / length(Direction) * light.Radius);
	float Amount = max(0, dot(Base,normalize(Distance)));
	
	return Base * Distance * light.Color * light.Intensity;
}

float4 Deferredlight(float2 TexCoords : TEXCOORD0) : COLOR
{
	float4 Base = tex2D(ColorMap, TexCoords);

	float3 PixelPosition = float3(ScreenWidth * TexCoords.x, ScreenHeight * TexCoords.y, 0);
		
	float4 FinalColor = (Base * AmbientColor * AmbientIntensity);

	for(int i=0; i <= numberOfLights; i++)
		FinalColor += CalculateLight(lights[i], Base, PixelPosition);

	return FinalColor;
}

void VertexShaderFunction(inout float4 color    : COLOR0,
                        inout float2 texCoord : TEXCOORD0,
                        inout float4 position : SV_Position)
{
    position = mul(position, MatrixTransform);
}

technique Deferred
{
	pass Pass1
	{
		VertexShader = compile vs_3_0 VertexShaderFunction();
		PixelShader = compile ps_3_0 Deferredlight();
	}
}

 

 

So is there a way to make SM 3.0 run faster?

Share this post


Link to post
Share on other sites
Hodgman    51230
I don't know how XNA compiles shaders, but when do it manually, you've got the option to have the compiler spit out the resulting assembly as text, which lets you look at exactly what instructions are being generated.
It's possible that because ps2 is so limited, it's actually using branch instructions, whereas ps3 isn't as instruction limited so the compiler is unrolling the loop and always doing 15 iterations... As well as [branch]/[flatten] statements on some platforms, there's also compiler flags that influence the use of branching vs unrolling.

Share this post


Link to post
Share on other sites
FantasyVII    1073

ok. thanks guys. just one last thing. Is there anything I can do to my shader to gain more FPS? Is my shader code good enough? Sorry for being a noob, but I just started learning HLSL about a week ago.

Share this post


Link to post
Share on other sites
phil_t    8084

Are you measuring with a Debug or Release build configuration? By default, XNA's Debug configuration compiles shaders without optimizations.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this