Jump to content
  • Advertisement
Sign in to follow this  
azzurro89

[XNA 4 & HLSL] How to skip optimization during shader compilation?

This topic is 2692 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello,

I'm working on a project that uses a shader which contains a number of flow controls. This makes the shader compilation runs really slow. For a general shader, it usually only takes seconds to compile, but this one takes around 5 minutes to compile. However I saw that there's a shader compiler option called "SkipOptimization" that might help me cut this long compile time in case of debugging (how am i suppose to say if everytime i change the shader code I have to wait for another 5 minutes to run the app, while I don't really need shader optimization for now). Does anyone know how to apply this "SkipOptimization" option in an .fx shader in XNA 4? I ran to MSDN page but the documentation showed that it's only available in XNA 3.1 or under. Is there another way to disable the optimization?

Share this post


Link to post
Share on other sites
Advertisement
http://blogs.msdn.com/b/shawnhar/archive/2010/05/07/effect-compilation-and-content-pipeline-automation-in-xna-game-studio-4-0.aspx explains one way of doing it - note the "EffectProcessorDebugMode.Optimize".

However you might find it more effective to change the shader. Concentrate on loops as those have the biggest impact on compile time in my experience. Splitting it up into multiple shaders instead of doing a conditional on a constant can also help.

Share this post


Link to post
Share on other sites

http://blogs.msdn.co...studio-4-0.aspx explains one way of doing it - note the "EffectProcessorDebugMode.Optimize".

However you might find it more effective to change the shader. Concentrate on loops as those have the biggest impact on compile time in my experience. Splitting it up into multiple shaders instead of doing a conditional on a constant can also help.



Thanks Adam.

I've change the Debug mode options in the effect processor properties to EffectProcessorDebugMode.Debug, as it is said that this produces unoptimized shader (thus skipping the optimization) but apparently there was no change in compile time duration. It still took me around 5 minutes though. I thought by skipping the optimization the compile time will be shorter, or am I guessing wrong? Anyway, as you recommended to split the shader, unfortunately I can't, since the loops are the main stuff in my shader and they have to be done in a single pass. I guess that leaves me no choice, eh? Any other suggestion guys?

Share this post


Link to post
Share on other sites
If you post the shader here, someone might be able to provide some specific advice on how to make it compile faster.

For example one thing that can help both runtime and compiling performance is changing complex calculations that are only based on one or two parameters into texture lookups.

Share this post


Link to post
Share on other sites
Do you have any long loops in your shader? Usually the compiler will unroll loops and evaluate static expressions within those loops, and for long/complex loops this can sometimes take an unusual amount of time. You can try and fix that by forcing the compiler to use loop instructions, which is done using the [loop] attribute.

Share this post


Link to post
Share on other sites
Okay, so here is the shader which was based on the [font="Arial"]Robust Multiple Specular Reflections and Refractions implementation in GPU Gems 3[/font] (I'm sorry if this shader is way too long to read).
There are some points that might help you get a better view on it:

  1. I'm forcing the compiler not to unroll any loops because it saves total number of instructions. I've tried the unrolled one and it exceeded the max number of instructions.
  2. I have to use branching for retrieving sampler arrays because the sampler indices cannot be accessed by non-literal such as an index variable inside the loop.
  3. I know that there are so many loops around but they are mandatory for the purpose so I can't get rid of them.

Well, is there any way I can get the shader to be more efficient? The long compile time is just overkill :(

//-----------------------------------------------------------------------------
// Functions
//-----------------------------------------------------------------------------

float4 GetSurfaceValue(int layer, float3 dir)
{
float4 output;

[branch] if(layer == 0)
output = texCUBElod(EnvColorMapSampler[0], float4(dir, 0));
[branch] if(layer == 1)
output = texCUBElod(EnvColorMapSampler[1], float4(dir, 0));
[branch] if(layer == 2)
output = texCUBElod(EnvColorMapSampler[2], float4(dir, 0));
[branch] if(layer == 3)
output = texCUBElod(EnvColorMapSampler[3], float4(dir, 0));
[branch] if(layer == 4)
output = texCUBElod(EnvColorMapSampler[4], float4(dir, 0));

return output;
}

float4 GetNormalDistanceValue(int layer, float3 dir)
{
float4 output;

[branch] if(layer == 0)
output = texCUBElod(EnvDistanceMapSampler[0], float4(dir, 0));
[branch] if(layer == 1)
output = texCUBElod(EnvDistanceMapSampler[1], float4(dir, 0));
[branch] if(layer == 2)
output = texCUBElod(EnvDistanceMapSampler[2], float4(dir, 0));
[branch] if(layer == 3)
output = texCUBElod(EnvDistanceMapSampler[3], float4(dir, 0));
[branch] if(layer == 4)
output = texCUBElod(EnvDistanceMapSampler[4], float4(dir, 0));

return output;
}

void LinearSearch( float3 x, float3 R, int layer,
out bool hit,
out float dl,
out float dp,
out float llp,
out float ppp)
{
hit = true;

float a = length(x) / length(R);
float3 s = normalize(x);
float3 e = normalize(R);
float dt = (-dot(s, e) + 1.0f) / 2.0f * ((float) MAX_LINEAR);
dt = max(dt, MIN_LINEAR);
dt = 1.0f / dt;
bool undershoot = false, overshoot = false;

// Perform linear search along the ray R
// -------------------------------------
float t = 0.01;
[loop] while(t < 1 && !(overshoot && undershoot))
{
float d = a * t / (1 - t); // Ray parameter corresponding to t
float3 r = x + R * d; // r(d): point on the ray

float ra = GetNormalDistanceValue(layer, r).a; // |r'|

[branch] if (ra > 0) // Valid texel, i.e. anything is visible
{
float rrp = length(r) / ra; //|r|/|r'|

if (rrp < 1) // Undershooting
{
dl = d; // Store last undershooting in dl
llp = rrp;
undershoot = true;
}
else // Overshooting
{
dp = d; // Store last overshooting as dp
ppp = rrp;
overshoot = true;
}
}
else // Nothing is visible: restart search
{
undershoot = false;
overshoot = false;
}
t += dt; // Next texel
}

[branch] if(!(overshoot && undershoot))
hit = false;
}

void SecantSearch( float3 x, float3 R, int layer,
float dl,
float dp,
float llp,
float ppp,
out float3 r,
out float d)
{
// if no secant iteration
r = x + R * dp;
d = dp;

[loop] for(int i= 0; i < MAX_SECANT; i++)
{
// Ray parameter of the new intersection
d = dl + (dp - dl) * (1 - llp) / (ppp - llp);
r = x + R * d; // New point on the ray
half pppNew = length(r) / GetNormalDistanceValue(layer, r).a; // |r|/|r'|

[branch] if (pppNew < 0.9999) // Undershooting
{
llp = pppNew; // Store as last undershooting
dl = d;
}
else if (pppNew > 1.0001) // Overshooting
{
ppp = pppNew; // Store as last overshooting
dp = d;
}
else i = MAX_SECANT;
}
}

float3 Hit(float3 x, float3 R, out float4 Il, out float3 Nl)
{
float3 p = 0;
float dist;
float minDist = INF;
bool hit;
int maxLayer = min(5, LayerNum);

[loop] for(int layer = 0; layer < maxLayer; layer++)
{
float dl = 0, dp, llp, ppp;
LinearSearch(x, R, layer, hit, dl, dp, llp, ppp);

[branch] if(hit)
{
SecantSearch(x, R, layer, dl, dp, llp, ppp, p, dist);

if(dist < minDist)
{
Il = GetSurfaceValue(layer, p);
Nl = GetNormalDistanceValue(layer, p).rgb;
minDist = dist;
}
}
}
return p;
}

float4 MultipleRaytrace(float3 x, float3 N, float3 V, float3 Fp0, float3 n0)
{
float4 I = float4(1, 1, 1, 0); // Radiance along the path
float3 Fp = Fp0; // Fresnel at 90 degrees at first hit
float n = n0; // Index of refraction of the first hit
int depth = 0; // Number of the traced path

[loop] while (depth < MAX_DEPTH)
{
float3 R; // Reflection or refraction direction

float3 F = Fp * tex2Dlod(FresnelMapSampler, float4(abs(dot(N, -V)), 0, 0, 0)).a; // Fresnel term

[branch] if(n <= 0) // Reflection
{
R = reflect(V, N); // Reflection direction
I.rgb *= F; // Fresnel reflection
}
else // Refraction
{
[branch] if(dot(V, N) > 0) // Ray comes from inside
{
n = 1 / n;
N = -N;
}
R = refract(V, N, 1 / n);
[branch] if(dot(R, R) == 0) // Refracted ray has no direction
R = reflect(V, N); // Total reflection
else
I.rgb *= (1 - F); // Fresnel refraction
}

float4 Il; // radiance at the hit point
float3 Nl; // normal vector at the hit point

// Trace ray x+R*d and obtain hit l, radiance Il, normal Nl
float3 l = Hit(x, R, Il, Nl);

n = Il.a;
if(n == 0) // Hit point is on diffuse surface
{
I.rgb *= Il.rgb; // Multiply with the radiance
I.a = 1;
depth = MAX_DEPTH; // Terminate the ray tracing
}
else // Hit point is on specular surface
{
Fp = Il.rgb; // Fresnel at 90 degrees
depth += 1;
}

// Next hit point
N = Nl;
V = R;
x = l;
}
return I * I.a;
}


//-----------------------------------------------------------------------------
// Vertex shaders
//-----------------------------------------------------------------------------

PSRaytraceInput VSRaytrace(VSInput vin)
{
PSRaytraceInput vout;

float4 pos_ws = mul(vin.Position, World);
float4 pos_ps = mul(pos_ws, ViewProjection);

vout.PositionPS = pos_ps;
vout.ScreenPos = pos_ps;
vout.PositionByLight = mul(pos_ws, LightsViewProjection);
vout.x = pos_ws.xyz - RefPoint;
vout.E = CameraPosition - pos_ws.xyz;
vout.N = mul(vin.Normal, World);

return vout;
}

//-----------------------------------------------------------------------------
// Pixel shader
//-----------------------------------------------------------------------------

float4 PS(PSRaytraceInput pin) : COLOR
{
float3 N = normalize(pin.N);
float3 E = normalize(pin.E);

float4 tracedColor = MultipleRaytrace(pin.x, N, -E, ReflectionColor, IOR);

float3 blendedColor = lerp(DiffuseColor, tracedColor.rgb, ReflectionAmount);
float4 color = float4(blendedColor + EmissiveColor, Alpha);

return color;
}


technique Shaded
{
pass Pass0
{
VertexShader = compile vs_3_0 VSRaytrace();
PixelShader = compile ps_3_0 PS();
}
}

Share this post


Link to post
Share on other sites
I changed the line: "[loop] while (depth < MAX_DEPTH)" to "[fastopt] [loop] while (depth < MAX_DEPTH)" and it compiles much much quicker for me.

By the way I guessed at all your declarations to make it compile, it would have been a lot simpler if you'd posted them...

Share this post


Link to post
Share on other sites

I changed the line: "[loop] while (depth < MAX_DEPTH)" to "[fastopt] [loop] while (depth < MAX_DEPTH)" and it compiles much much quicker for me.

By the way I guessed at all your declarations to make it compile, it would have been a lot simpler if you'd posted them...


Did you use XNA 4 to compile it Adam? Because in my VS2010, the warning showed me that [fastopt] is an unknown attribute. Does it mean this option is not supported by XNA shader compiler?
Regardless of the warning, it successfully compiled the shader in an exact same duration of the original one :(

Share this post


Link to post
Share on other sites
I used fxc on the command line with the June 2010 DirectX SDK. It's possible that updating DirectX will solve that warning.

Here's the modified code I compiled. The command line was "fxc /Tps_3_0 filename.fx"

Also note that without fastopt the compile time varied significantly with the MAX_DEPTH setting. Low numbers compiled in seconds.

#define MAX_DEPTH 42
#define LayerNum 5
#define MAX_LINEAR 42
#define MIN_LINEAR 1
#define MAX_SECANT 42
#define MIN_SECANT 1
#define INF 1e10f

#define IOR 1.1f

samplerCUBE EnvColorMapSampler[LayerNum];
samplerCUBE EnvDistanceMapSampler[LayerNum];
sampler2D FresnelMapSampler;

float4x4 World;
float4x4 ViewProjection;
float4x4 LightsViewProjection;

float3 RefPoint;
float3 CameraPosition;

float3 EmissiveColor;
float3 ReflectionColor;
float3 ReflectionAmount;
float3 DiffuseColor;

float Alpha;

struct PSRaytraceInput
{
float4 PositionPS : POSITION;
float4 ScreenPos : TEXCOORD0;
float4 PositionByLight : TEXCOORD1;
float3 x : TEXCOORD2;
float3 E : TEXCOORD3;
float3 N : TEXCOORD4;
};

struct VSInput
{
float3 Position;
float3 Normal;
};


//-----------------------------------------------------------------------------
// Functions
//-----------------------------------------------------------------------------

float4 GetSurfaceValue(int layer, float3 dir)
{
float4 output;

[branch] if(layer == 0)
output = texCUBElod(EnvColorMapSampler[0], float4(dir, 0));
[branch] if(layer == 1)
output = texCUBElod(EnvColorMapSampler[1], float4(dir, 0));
[branch] if(layer == 2)
output = texCUBElod(EnvColorMapSampler[2], float4(dir, 0));
[branch] if(layer == 3)
output = texCUBElod(EnvColorMapSampler[3], float4(dir, 0));
[branch] if(layer == 4)
output = texCUBElod(EnvColorMapSampler[4], float4(dir, 0));

return output;
}

float4 GetNormalDistanceValue(int layer, float3 dir)
{
float4 output;

[branch] if(layer == 0)
output = texCUBElod(EnvDistanceMapSampler[0], float4(dir, 0));
[branch] if(layer == 1)
output = texCUBElod(EnvDistanceMapSampler[1], float4(dir, 0));
[branch] if(layer == 2)
output = texCUBElod(EnvDistanceMapSampler[2], float4(dir, 0));
[branch] if(layer == 3)
output = texCUBElod(EnvDistanceMapSampler[3], float4(dir, 0));
[branch] if(layer == 4)
output = texCUBElod(EnvDistanceMapSampler[4], float4(dir, 0));

return output;
}

void LinearSearch( float3 x, float3 R, int layer,
out bool hit,
out float dl,
out float dp,
out float llp,
out float ppp)
{
hit = true;

float a = length(x) / length(R);
float3 s = normalize(x);
float3 e = normalize(R);
float dt = (-dot(s, e) + 1.0f) / 2.0f * ((float) MAX_LINEAR);
dt = max(dt, MIN_LINEAR);
dt = 1.0f / dt;
bool undershoot = false, overshoot = false;

// Perform linear search along the ray R
// -------------------------------------
float t = 0.01;
[loop] while(t < 1 && !(overshoot && undershoot))
{
float d = a * t / (1 - t); // Ray parameter corresponding to t
float3 r = x + R * d; // r(d): point on the ray

float ra = GetNormalDistanceValue(layer, r).a; // |r'|

[branch] if (ra > 0) // Valid texel, i.e. anything is visible
{
float rrp = length(r) / ra; //|r|/|r'|

if (rrp < 1) // Undershooting
{
dl = d; // Store last undershooting in dl
llp = rrp;
undershoot = true;
}
else // Overshooting
{
dp = d; // Store last overshooting as dp
ppp = rrp;
overshoot = true;
}
}
else // Nothing is visible: restart search
{
undershoot = false;
overshoot = false;
}
t += dt; // Next texel
}

[branch] if(!(overshoot && undershoot))
hit = false;
}

void SecantSearch( float3 x, float3 R, int layer,
float dl,
float dp,
float llp,
float ppp,
out float3 r,
out float d)
{
// if no secant iteration
r = x + R * dp;
d = dp;

[loop] for(int i= 0; i < MAX_SECANT; i++)
{
// Ray parameter of the new intersection
d = dl + (dp - dl) * (1 - llp) / (ppp - llp);
r = x + R * d; // New point on the ray
half pppNew = length(r) / GetNormalDistanceValue(layer, r).a; // |r|/|r'|

[branch] if (pppNew < 0.9999) // Undershooting
{
llp = pppNew; // Store as last undershooting
dl = d;
}
else if (pppNew > 1.0001) // Overshooting
{
ppp = pppNew; // Store as last overshooting
dp = d;
}
else i = MAX_SECANT;
}
}

float3 Hit(float3 x, float3 R, out float4 Il, out float3 Nl)
{
float3 p = 0;
float dist;
float minDist = INF;
bool hit;
int maxLayer = min(5, LayerNum);

[loop] for(int layer = 0; layer < maxLayer; layer++)
{
float dl = 0, dp, llp, ppp;
LinearSearch(x, R, layer, hit, dl, dp, llp, ppp);

[branch] if(hit)
{
SecantSearch(x, R, layer, dl, dp, llp, ppp, p, dist);

if(dist < minDist)
{
Il = GetSurfaceValue(layer, p);
Nl = GetNormalDistanceValue(layer, p).rgb;
minDist = dist;
}
}
}
return p;
}

float4 MultipleRaytrace(float3 x, float3 N, float3 V, float3 Fp0, float3 n0)
{
float4 I = float4(1, 1, 1, 0); // Radiance along the path
float3 Fp = Fp0; // Fresnel at 90 degrees at first hit
float n = n0; // Index of refraction of the first hit
int depth = 0; // Number of the traced path

[fastopt] [loop] while (depth < MAX_DEPTH)
{
float3 R; // Reflection or refraction direction

float3 F = Fp * tex2Dlod(FresnelMapSampler, float4(abs(dot(N, -V)), 0, 0, 0)).a;

// Fresnel term

[branch] if(n <= 0) // Reflection
{
R = reflect(V, N); // Reflection direction
I.rgb *= F; // Fresnel reflection
}
else // Refraction
{
[branch] if(dot(V, N) > 0) // Ray comes from inside
{
n = 1 / n;
N = -N;
}
R = refract(V, N, 1 / n);
[branch] if(dot(R, R) == 0) // Refracted ray has no direction
R = reflect(V, N); // Total reflection
else
I.rgb *= (1 - F); // Fresnel refraction
}

float4 Il; // radiance at the hit point
float3 Nl; // normal vector at the hit point

// Trace ray x+R*d and obtain hit l, radiance Il, normal Nl
float3 l = Hit(x, R, Il, Nl);

n = Il.a;
if(n == 0) // Hit point is on diffuse surface
{
I.rgb *= Il.rgb; // Multiply with the radiance
I.a = 1;
depth = MAX_DEPTH; // Terminate the ray tracing
}
else // Hit point is on specular surface
{
Fp = Il.rgb; // Fresnel at 90 degrees
depth += 1;
}

// Next hit point
N = Nl;
V = R;
x = l;
}
return I * I.a;
}


//-----------------------------------------------------------------------------
// Vertex shaders
//-----------------------------------------------------------------------------

PSRaytraceInput VSRaytrace(VSInput vin)
{
PSRaytraceInput vout;

float4 pos_ws = mul(vin.Position, World);
float4 pos_ps = mul(pos_ws, ViewProjection);

vout.PositionPS = pos_ps;
vout.ScreenPos = pos_ps;
vout.PositionByLight = mul(pos_ws, LightsViewProjection);
vout.x = pos_ws.xyz - RefPoint;
vout.E = CameraPosition - pos_ws.xyz;
vout.N = mul(vin.Normal, World);

return vout;
}

//-----------------------------------------------------------------------------
// Pixel shader
//-----------------------------------------------------------------------------

float4 main(PSRaytraceInput pin) : COLOR
{
float3 N = normalize(pin.N);
float3 E = normalize(pin.E);

float4 tracedColor = MultipleRaytrace(pin.x, N, -E, ReflectionColor, IOR);

float3 blendedColor = lerp(DiffuseColor, tracedColor.rgb, ReflectionAmount);
float4 color = float4(blendedColor + EmissiveColor, Alpha);

return color;
}


//technique Shaded
//{
// pass Pass0
// {
// VertexShader = compile vs_3_0 VSRaytrace();
// PixelShader = compile ps_3_0 PS();
// }
//}

Share this post


Link to post
Share on other sites

I used fxc on the command line with the June 2010 DirectX SDK. It's possible that updating DirectX will solve that warning.



He's using XNA, so he's stuck with whatever version of the compiler they shipped with. You used to be able to compile shaders and effects yourself with a custom content processor, but you can't do that in XNA 4 since they now add their own metadata into the compiled effect.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!