Sign in to follow this  
L. Spiro

OpenGL [DirectX 11] Sudden Saturday Shadow Sadness Syndrome

Recommended Posts

I just got out of rehab from my previous shadow problem.
I was able to change the type so that I could view it in PIX and then I was able to see that the values had too much range.
The way to fix it follows.
Old Code:
[CODE]float shadow2dDepth( Texture2D _tTexture, float2 _vCoord ){ return _tTexture.Sample( lsg_SamplerShadow, _vCoord ).x; }[/CODE]
New Code:
[CODE]float shadow2dDepth( Texture2D _tTexture, float2 _vCoord ){ return _tTexture.Sample( lsg_SamplerShadow, _vCoord ).x * 0.5 + 0.5; }[/CODE][list]
[*]This works but obviously I prefer a more efficient shader. In DirectX 9 there is no way to read from a depth surface so I have to create a colored surface and output depth to that manually, which is where I perform this conversion, however in OpenGL and OpenGL ES 2 the depth surface can be read directly. But it works without having to perform this conversion anywhere. I don’t set up a special viewport depth range or do the conversion in the shader. Isn’t this X * 0.5 + 0.5 conversion supposed to be done by the rasterizer of Direct3D 11? What do I need to do to make it do this instead of me doing it in my shader?

The second issue is that my PCSSM shader fails to compile with the following:
[quote name='My Direct3D 11 Compiler']e:\Blah\x64\DirectX11 Debug\Shader@0x00000000036A1CC0(105,16): error X4014: cannot have gradient operations inside loops with divergent flow control[/quote]
Here are the relevant parts of the shader:
[CODE]float PCSSShadowMap( in vec4 _vShadowCoord ) {
float fSum = (LSE_PCF_STEPS * 2.0 + 1.0);
float fTotal = fSum * fSum;
if ( _vShadowCoord.w > 0.0 && _vShadowCoord.x >= 0.0 && _vShadowCoord.x <= 1.0 && _vShadowCoord.y >= 0.0 && _vShadowCoord.y <= 1.0 ) {
float fAvgDepth = 0.0;
float fTotalBlockers = 0.0;
vec4 vShadowCoordWDivide = _vShadowCoord / _vShadowCoord.w;
//vShadowCoordWDivide.z -= 0.000625 * 0.125;
FindBlockers( vShadowCoordWDivide.xy, vShadowCoordWDivide.z, g_vShadowMapUvDepth.xy * 1.25,
fTotalBlockers, fAvgDepth );
if ( fTotalBlockers != 0.0 ) {
fTotal = 0.0;
vec2 vSize = g_vShadowMapUvDepth.xy * g_fShadowMapCasterSize * fAvgDepth * g_vShadowMapUvDepth.z;
// Get the distance within the shadow we are.
vec2 vThis;
vec2 fStepUv = vSize / LSE_PCF_STEPS;
for ( float y = -LSE_PCF_STEPS; y <= LSE_PCF_STEPS; y++ ) {
for ( float x = -LSE_PCF_STEPS; x <= LSE_PCF_STEPS; x++ ) {
vec2 vOffset = vec2( x, y ) * fStepUv; // **************** LINE 105 **************** //
float fDepth = shadow2dDepth( g_sShadowTex, vShadowCoordWDivide.xy + vOffset );
fTotal += (fDepth == 1.0 || fDepth > vShadowCoordWDivide.z) ? 1.0 : 0.0;
return fTotal / (fSum * fSum);
[*]Why is it barking at that line and how can I rewrite it to work?
If you need shader code that actually compiles, the actual shader that is sent to Direct3D 11 follows. Yes, it is ugly. If you have heart conditions or are pregnant, viewer discretion is advised.
[spoiler]float mix( in float _fX, in float _fY, in float _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float2 mix( in float2 _fX, in float2 _fY, in float _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float3 mix( in float3 _fX, in float3 _fY, in float _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float4 mix( in float4 _fX, in float4 _fY, in float _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float2 mix( in float2 _fX, in float2 _fY, in float2 _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float3 mix( in float3 _fX, in float3 _fY, in float3 _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
float4 mix( in float4 _fX, in float4 _fY, in float4 _fA ) { return _fX * (1.0 - _fA) + _fY * _fA; }
SamplerState lsg_SamplerBiLinearRepeat:register(s0){Filter=MIN_MAG_LINEAR_MIP_POINT;AddressU=WRAP;AddressV=WRAP;};
SamplerState lsg_SamplerBiLinearClamp:register(s1){Filter=MIN_MAG_LINEAR_MIP_POINT;AddressU=CLAMP;AddressV=CLAMP;};
SamplerState lsg_SamplerShadow:register(s15){Filter=MIN_MAG_LINEAR_MIP_POINT;AddressU=CLAMP;AddressV=CLAMP;};
float shadow2dDepth( Texture2D _tTexture, float2 _vCoord ){ return _tTexture.Sample( lsg_SamplerShadow, _vCoord ).x * 0.5 + 0.5; }
Texture2D g_sShadowTex:register(t15);
cbuffer cb0:register(b0){
cbuffer cb1:register(b1){
cbuffer cb2:register(b2){
cbuffer cb3:register(b3){
int g_iTotalDirLights:packoffset(c0.x);
float g_fShadowMapCasterSize:packoffset(c5.w);
#line 1 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 2 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 3 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 4 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 5 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 6 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 8 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 9 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 10 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 11 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 12 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 13 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 14 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 15 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 17 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 1 // e:/Data/LSDLighting.lssl
#line 2 // e:/Data/LSDLighting.lssl
#line 3 // e:/Data/LSDLighting.lssl
#line 4 // e:/Data/LSDLighting.lssl
#line 5 // e:/Data/LSDLighting.lssl
#line 6 // e:/Data/LSDLighting.lssl
#line 7 // e:/Data/LSDLighting.lssl
#line 8 // e:/Data/LSDLighting.lssl
#line 9 // e:/Data/LSDLighting.lssl
#line 10 // e:/Data/LSDLighting.lssl
#line 11 // e:/Data/LSDLighting.lssl
#line 12 // e:/Data/LSDLighting.lssl
#line 13 // e:/Data/LSDLighting.lssl
#line 14 // e:/Data/LSDLighting.lssl
#line 15 // e:/Data/LSDLighting.lssl
#line 25 // e:/Data/LSDLighting.lssl
#line 52 // e:/Data/LSDLighting.lssl
#line 102 // e:/Data/LSDLighting.lssl
#line 126 // e:/Data/LSDLighting.lssl
#line 139 // e:/Data/LSDLighting.lssl
#line 181 // e:/Data/LSDLighting.lssl
#line 223 // e:/Data/LSDLighting.lssl
#line 289 // e:/Data/LSDLighting.lssl
#line 360 // e:/Data/LSDLighting.lssl
LSE_COLOR_PAIR GetDirLightColorAshikhminShirley(in vector<float,3>_vNormalInViewSpace,in vector<float,4>_vViewVector,in int _iIndex){
#line 300 // e:/Data/LSDLighting.lssl
#line 301 // e:/Data/LSDLighting.lssl
#line 304 // e:/Data/LSDLighting.lssl
#line 305 // e:/Data/LSDLighting.lssl
#line 306 // e:/Data/LSDLighting.lssl
#line 307 // e:/Data/LSDLighting.lssl
#line 310 // e:/Data/LSDLighting.lssl
float fNormalDotHalf=max(dot(_vNormalInViewSpace,vHalfVec),0.0);
#line 311 // e:/Data/LSDLighting.lssl
float fNormalDotView=dot(_vNormalInViewSpace,;
#line 312 // e:/Data/LSDLighting.lssl
float fNormalDotLight=dot(_vNormalInViewSpace,vLightDir);
#line 313 // e:/Data/LSDLighting.lssl
float fLightDotHalf=dot(vLightDir,vHalfVec);
#line 314 // e:/Data/LSDLighting.lssl
float fTangentDotHalf=dot(fTangent,vHalfVec);
#line 315 // e:/Data/LSDLighting.lssl
float fBiTangentDotHalf=dot(fBiTangent,vHalfVec);
#line 322 // e:/Data/LSDLighting.lssl
const float fRs=0.29999999999999999;
#line 326 // e:/Data/LSDLighting.lssl
#line 328 // e:/Data/LSDLighting.lssl
#line 329 // e:/Data/LSDLighting.lssl
float fTemp=(1.0-(fNormalDotLight*0.5));
#line 330 // e:/Data/LSDLighting.lssl
float fTemp2=(fTemp*fTemp);
#line 331 // e:/Data/LSDLighting.lssl
#line 332 // e:/Data/LSDLighting.lssl
#line 333 // e:/Data/LSDLighting.lssl
#line 334 // e:/Data/LSDLighting.lssl*(1.0-((fTemp2*fTemp2)*fTemp)));
#line 335 // e:/Data/LSDLighting.lssl
#line 340 // e:/Data/LSDLighting.lssl
float fNumExp=(((g_vAnistropy.x*fTangentDotHalf)*fTangentDotHalf)+((g_vAnistropy.y*fBiTangentDotHalf)*fBiTangentDotHalf));
#line 341 // e:/Data/LSDLighting.lssl
#line 342 // e:/Data/LSDLighting.lssl
float fNum=sqrt(((g_vAnistropy.x+1.0)*(g_vAnistropy.y+1.0)));
#line 343 // e:/Data/LSDLighting.lssl
#line 345 // e:/Data/LSDLighting.lssl
float fDen=((8.0*3.1415899999999999)*fNormalDotHalf);
#line 346 // e:/Data/LSDLighting.lssl
#line 348 // e:/Data/LSDLighting.lssl
#line 349 // e:/Data/LSDLighting.lssl
#line 350 // e:/Data/LSDLighting.lssl
#line 351 // e:/Data/LSDLighting.lssl
#line 353 // e:/Data/LSDLighting.lssl*=(fRs+((1.0-fRs)*((fTemp2*fTemp2)*fTemp)));
#line 354 // e:/Data/LSDLighting.lssl
#line 355 // e:/Data/LSDLighting.lssl
return cpRet;}
#line 1 // e:/Data/LSDShadowing.lssl
#line 2 // e:/Data/LSDShadowing.lssl
#line 3 // e:/Data/LSDShadowing.lssl
#line 4 // e:/Data/LSDShadowing.lssl
#line 31 // e:/Data/LSDShadowing.lssl
#line 46 // e:/Data/LSDShadowing.lssl
#line 73 // e:/Data/LSDShadowing.lssl
void FindBlockers(in vector<float,2>_vPos,in float _zViewDepth,in vector<float,2>_vRadius,out float _fBlockers,out float _fAvgDepth){
#line 58 // e:/Data/LSDShadowing.lssl
#line 59 // e:/Data/LSDShadowing.lssl
#line 60 // e:/Data/LSDShadowing.lssl
#line 71 // e:/Data/LSDShadowing.lssl
for(float y=-1.0;
#line 61 // e:/Data/LSDShadowing.lssl
#line 70 // e:/Data/LSDShadowing.lssl
for(float x=-1.0;
#line 62 // e:/Data/LSDShadowing.lssl
#line 63 // e:/Data/LSDShadowing.lssl
#line 64 // e:/Data/LSDShadowing.lssl
float fDepth=shadow2dDepth(g_sShadowTex,(_vPos+vOffset));
#line 67 // e:/Data/LSDShadowing.lssl
#line 68 // e:/Data/LSDShadowing.lssl
#line 72 // e:/Data/LSDShadowing.lssl
#line 113 // e:/Data/LSDShadowing.lssl
float PCSSShadowMap(in vector<float,4>_vShadowCoord){
#line 82 // e:/Data/LSDShadowing.lssl
float fSum=((2.0*2.0)+1.0);
#line 83 // e:/Data/LSDShadowing.lssl
float fTotal=(fSum*fSum);
#line 85 // e:/Data/LSDShadowing.lssl
float fAvgDepth=0.0;
#line 86 // e:/Data/LSDShadowing.lssl
float fTotalBlockers=0.0;
#line 88 // e:/Data/LSDShadowing.lssl
#line 91 // e:/Data/LSDShadowing.lssl
#line 93 // e:/Data/LSDShadowing.lssl
#line 94 // e:/Data/LSDShadowing.lssl
#line 98 // e:/Data/LSDShadowing.lssl
#line 101 // e:/Data/LSDShadowing.lssl
#line 109 // e:/Data/LSDShadowing.lssl
for(float y=-2.0;
#line 103 // e:/Data/LSDShadowing.lssl
#line 108 // e:/Data/LSDShadowing.lssl
for(float x=-2.0;
#line 104 // e:/Data/LSDShadowing.lssl
#line 105 // e:/Data/LSDShadowing.lssl
#line 106 // e:/Data/LSDShadowing.lssl
float fDepth=shadow2dDepth(g_sShadowTex,(vShadowCoordWDivide.xy+vOffset));
#line 107 // e:/Data/LSDShadowing.lssl
return (fTotal/(fSum*fSum));}
#line 125 // e:/Data/LSDShadowing.lssl
#line 201 // e:/Data/LSDDefaultForwardPixelShader.lssl
void Main(in vector<float,3>_vInNormal:NORMAL0,in vector<float,2>_vIn2dTex0:TEXCOORD2,in vector<float,4>_vInPos:SV_POSITION0,in vector<float,4>_vInEyePos:TEXCOORD1,out vector<float,4>_vOutColor:SV_Target0){
#line 71 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 75 // e:/Data/LSDDefaultForwardPixelShader.lssl
float fShadow=PCSSShadowMap(vShadowCoord);
#line 86 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 93 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 101 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 108 // e:/Data/LSDDefaultForwardPixelShader.lssl
LSE_COLOR_PAIR cpLightColors={vector<float,4>(0.0,0.0,0.0,0.0),vector<float,4>(0.0,0.0,0.0,0.0)};
#line 122 // e:/Data/LSDDefaultForwardPixelShader.lssl
for(int I=0;
#line 110 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 115 // e:/Data/LSDDefaultForwardPixelShader.lssl
LSE_COLOR_PAIR cpThis=GetDirLightColorAshikhminShirley(vNormalizedNormal,vViewPosToEye,I);
#line 120 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 121 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 145 // e:/Data/LSDDefaultForwardPixelShader.lssl
#line 178 // e:/Data/LSDDefaultForwardPixelShader.lssl**cpLightColors.cSpecular).xyz);
#line 184 // e:/Data/LSDDefaultForwardPixelShader.lssl*=fShadow;
#line 200 // e:/Data/LSDDefaultForwardPixelShader.lssl

L. Spiro

Share this post

Link to post
Share on other sites
[quote name='L. Spiro' timestamp='1343405279' post='4963682']
In DirectX 9 there is no way to read from a depth surface

I know this is not what you were asking about, but you can sample from a depth texture in dx9, through vendor specific extensions.
INTZ works on pretty much all non ancient ati and nv hw (

Share this post

Link to post
Share on other sites
"Gradient operations" refer to anything that computes partial derivatives in the pixel shader, and in this particular case it's referring to the "Sample" function. You can't compute derivatives inside of dynamic flow control, since they're undefined if one of the pixels in the quad doesn't take the same path. So you need to either...

A. Use a sampling function that doesn't compute gradients, such as SampleLevel or SampleCmpLevelZero


B. Flatten all branches and unroll all loops in which you need to compute gradients Edited by MJP

Share this post

Link to post
Share on other sites
SampleLevel worked; thank you. My DirectX 11 side is now fully caught up to my DirectX 9, OpenGL 3.2, and OpenGL ES 2 sides. Now I can get serious about new graphics features.

What about the first issue?

L. Spiro

Share this post

Link to post
Share on other sites

[quote name='L. Spiro' timestamp='1343414616' post='4963727']
What about the first issue?
As you know, one of the many (meticulously hidden) differences between GL and D3D is that the range of the z-coordinate in clipping space differs. In GL the clipping space z goes from [-1…1] and in D3D it goes from [0…1]. GL and GL ES give direct access to the [-1…1] coordinate, which happens to be correct, since the clipping space coordinate you want to compare to is also in [-1…1] as well. Very convenient. In D3D (9 and 11) it is – surprise, surprise – the same, but the clipping space z-coordinate is in [0…1]. If you store the coordinate unaltered, you can just read from it and directly use it without conversions, e.g. render to depth texture, fetch the depth later and compare it to the depth in clipping space from the light’s point of view.

But now, you confuse me a little. How can it be that the values had “too much range” in D3D (i.e. ended up in [-1…1])? What puzzles me even more is that you convert it to [0…1] to compare it to a depth value in [0…1]. How can it be that one coordinate ended up in [-1…1] needing a conversion and the other one is in [0…1]? It appears to me that there is an inconsistency at some point.

I have the feeling that you currently store the depth in [-1…1]. This means, you’re converting it at writing to the render target and when reading from it. (Note that you invert the operation at reading that you have done at writing. --> You can avoid that entirely.) The depth value you compare to is coming out from a projection matrix, thus is still in [0…1], right?

I’m quite sure you don’t, but: Do you use the exact same projection matrix in D3D as you use in GL? (That would cause the depth to be in D3D in [-1…1].) That would be a problem, since in D3D the projection matrix returns something with depth in [0...1] and in GL in [-1…1]. The D3D rasterizer would happily clip away half of your frustum, since in D3D-country things are not getting negative. So… using a proper projection matrix that returns values in the correct range would be the easiest fix, as it would render all needs for conversions void.

Also, storing the depth consistently over all platforms in [-1…1] is rather impossible to achieve (isn't it?), since the D3D depth buffer just happens to store in [0…1]. You may change the coordinate in D3D9, when writing to the render target (needing a conversion at reading), but it won’t help you much in D3D10+, if you use the real depth buffer.
Or can you persuade D3D to work in [-1...1] too by messing with the viewport? Hm...

Is there a reason, you need to have explicitly the depth values in [-1…1] on all platforms? I see that you would like to have consistency, but wouldn't it be just fine if the range is in the "correct" space of the respective platform? If the projection matrix leads you to the correct space (GL: [-1...1], D3D: [0...1], there shouldn't be much to worry about, right?

Best regards!

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Similar Content

    • By xhcao
      Before using void glBindImageTexture(    GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format), does need to make sure that texture is completeness. 
    • By cebugdev
      hi guys, 
      are there any books, link online or any other resources that discusses on how to build special effects such as magic, lightning, etc. in OpenGL? i mean, yeah most of them are using particles but im looking for resources specifically on how to manipulate the particles to look like an effect that can be use for games,. i did fire particle before, and I want to learn how to do the other 'magic' as well.
      Like are there one book or link(cant find in google) that atleast featured how to make different particle effects in OpenGL (or DirectX)? If there is no one stop shop for it, maybe ill just look for some tips on how to make a particle engine that is flexible enough to enable me to design different effects/magic 
      let me know if you guys have recommendations.
      Thank you in advance!
    • By dud3
      How do we rotate the camera around x axis 360 degrees, without having the strange effect as in my video below? 
      Mine behaves exactly the same way spherical coordinates would, I'm using euler angles.
      Tried googling, but couldn't find a proper answer, guessing I don't know what exactly to google for, googled 'rotate 360 around x axis', got no proper answers.
      The video shows the difference between blender and my rotation:
    • By Defend
      I've had a Google around for this but haven't yet found some solid advice. There is a lot of "it depends", but I'm not sure on what.
      My question is what's a good rule of thumb to follow when it comes to creating/using VBOs & VAOs? As in, when should I use multiple or when should I not? My understanding so far is that if I need a new VBO, then I need a new VAO. So when it comes to rendering multiple objects I can either:
      * make lots of VAO/VBO pairs and flip through them to render different objects, or
      * make one big VBO and jump around its memory to render different objects. 
      I also understand that if I need to render objects with different vertex attributes, then a new VAO is necessary in this case.
      If that "it depends" really is quite variable, what's best for a beginner with OpenGL, assuming that better approaches can be learnt later with better understanding?
    • By test opty
      Hello all,
      On my Windows 7 x64 machine I wrote the code below on VS 2017 and ran it.
      #include <glad/glad.h>  #include <GLFW/glfw3.h> #include <std_lib_facilities_4.h> using namespace std; void framebuffer_size_callback(GLFWwindow* window , int width, int height) {     glViewport(0, 0, width, height); } //****************************** void processInput(GLFWwindow* window) {     if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)         glfwSetWindowShouldClose(window, true); } //********************************* int main() {     glfwInit();     glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);     glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);     glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);     //glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);     GLFWwindow* window = glfwCreateWindow(800, 600, "LearnOpenGL", nullptr, nullptr);     if (window == nullptr)     {         cout << "Failed to create GLFW window" << endl;         glfwTerminate();         return -1;     }     glfwMakeContextCurrent(window);     if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))     {         cout << "Failed to initialize GLAD" << endl;         return -1;     }     glViewport(0, 0, 600, 480);     glfwSetFramebufferSizeCallback(window, framebuffer_size_callback);     glClearColor(0.2f, 0.3f, 0.3f, 1.0f);     glClear(GL_COLOR_BUFFER_BIT);     while (!glfwWindowShouldClose(window))     {         processInput(window);         glfwSwapBuffers(window);         glfwPollEvents();     }     glfwTerminate();     return 0; }  
      The result should be a fixed dark green-blueish color as the end of here. But the color of my window turns from black to green-blueish repeatedly in high speed! I thought it might be a problem with my Graphics card driver but I've updated it and it's: NVIDIA GeForce GTX 750 Ti.
      What is the problem and how to solve it please?
  • Popular Now