Sign in to follow this  

OpenGL Forcing early Z, which extension to use?

This topic is 1650 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As far as my research has shown that there are two different extensions that can be used to force an early Z test in OpenGL 3.0. these are GL_ARB_conservative_depth and GL_ARB_shader_image_load_store. According to the spec they can be used to force early Z in glsl like so:

 

GL_ARB_conservative_depth:

layout(depth_unchanged) out float gl_FragDepth;

GL_ARB_shader_image_load_store:

layout(early_fragment_tests) in;

My question is do these have the exact same effect? if so can I use them interchangeably, if not which one should I use? 

Share this post


Link to post
Share on other sites

http://www.opengl.org/wiki/Image_Load_Store

Does not load like image_load_store has anything to do with early depth test. So no, you should use the conservative depth.

 

Yeah I know that the purpose of the extension is not specifically for early z tests but it can be used to enable them as I read here:

 

http://www.opengl.org/wiki/Early_Depth_Test#Explicit_specification

 

What I was hoping was that cards that didn't support GL_ARB_conservative_depth might support GL_ARB_shader_image_load_store to be used as a fallback or vice versa.

Share this post


Link to post
Share on other sites

I don't know. I just read that and nothing there suggests anything you are talking about. It is talking about how one thing affects the other. It says nothing about how textures (image load store) will effect depth testing. It does however say how depth testing will affect the image load store.

FYI from that page: "Thus the first restriction on early depth tests is that they cannot happen if the fragment shader writes gl_FragDepth?. If the fragment shader modifies the depth, then the depth test must wait until after the fragment shader executes."

 

In GL 2.0 (and it seems it has carried on to newer versions). If you don't write the depth in the shader, early z-cull and depth writing already takes place.

Share this post


Link to post
Share on other sites

FYI from that page: "Thus the first restriction on early depth tests is that they cannot happen if the fragment shader writes gl_FragDepth?. If the fragment shader modifies the depth, then the depth test must wait until after the fragment shader executes."

 

I'm sorry but you clearly didn't read that entire article, the use of that syntax for forcing early Z requires GL_ARB_shader_image_load_store:

 

More recent hardware can force early depth tests, using a special fragment shader layout qualifier:

layout(early_fragment_tests) in;

This will also perform early stencil tests.

 

...

 

This feature exists to ensure proper behavior when using Image Load Store or other incoherent memory writing.

 

Its mentioned in the spec too:

http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt

 

An explicit control is provided to allow fragment shaders to enable early

fragment tests. If the fragment shader specifies the
"early_fragment_tests" layout qualifier, the per-fragment tests described
in Section 3.X will be performed prior to fragment shader execution.
Otherwise, they will be performed after fragment shader execution.
Edited by ic0de

Share this post


Link to post
Share on other sites


In GL 2.0 (and it seems it has carried on to newer versions). If you don't write the depth in the shader, early z-cull and depth writing already takes place.

Yep.

 

I don't think OpenGL 2 even has the notion of early depth test at all (much like how it doesn't specify the exact algorithm for defining the shape of triangles). It was an optimization done by the hardware and as long as it gave the expected results it could do anything it wanted, so early depth tests worked by default simply because there was nothing against it. I imagine that disabling it if you modify the depth in a pixel shader has to do with caching (it invalidates the value in the cache).

Share this post


Link to post
Share on other sites
I don't think OpenGL 2 even has the notion of early depth test at all (much like how it doesn't specify the exact algorithm for defining the shape of triangles). It was an optimization done by the hardware and as long as it gave the expected results it could do anything it wanted, so early depth tests worked by default simply because there was nothing against it. I imagine that disabling it if you modify the depth in a pixel shader has to do with caching (it invalidates the value in the cache)..

 

Using blending will also disable this hardware optimization, for PowerVR at least.  They call this feature "Tile Based Deferred Rendering" in case anyone wants to look it up.  Those little machines can handle a lot until you turn on blending, and presumably pixel shader depth writes.  Once you do this they slow to a crawl.

 

I haven't gone to any trouble to see if the other embedded system manufacturers have similar schemes running under the hood. 

Share this post


Link to post
Share on other sites
Well as this article (which I didn't initially read) suggests, some cards support this and they use the word "explicitly" which implies that there is an "implicit" case.
ATI and NVIDIA are going to be supporting this early optimization. I don't know anything that suggests otherwise and have read some internal docs.

This extension also provides the capability to explicitly enable "early"
    per-fragment tests, where operations like depth and stencil testing are
    performed prior to fragment shader execution.  In unextended OpenGL,
    fragment shaders never have any side effects and implementations can
    sometimes perform per-fragment tests and discard some fragments prior to
    executing the fragment shader.

Share this post


Link to post
Share on other sites

If you investigate this page it further supports my claim:
http://gamedev.stackexchange.com/questions/16588/computing-gl-fragdepth

It has been existing for a long time. Also its not just early z-cull, its hierarchical early z-cull.  Look up hierarchical occlusion culling. Graphics cards support this on a per-triangle level, which would not be possible if the shader executed first.

I believe the extension is explicitly able to perform the depth test by reading the depth buffer. Look up "discard".  You can discard any fragment in GL, to explicitly discard if (z < depthBuffer.z), you were not allowed direct access to the depth buffer. I don't know but am assuming that you are now allowed to read it. This may only be if you are using an FBO though........

Share this post


Link to post
Share on other sites

Using blending will also disable this hardware optimization, for PowerVR at least.  They call this feature "Tile Based Deferred Rendering" in case anyone wants to look it up.  Those little machines can handle a lot until you turn on blending, and presumably pixel shader depth writes.  Once you do this they slow to a crawl.

Is this tested? Have you tried enabling and then disabling depth test on enough blended fragments to test the performance is actually different? Seems strange this would happen since GL is a state machine.

Share this post


Link to post
Share on other sites


Using blending will also disable this hardware optimization, for PowerVR at least.  They call this feature "Tile Based Deferred Rendering" in case anyone wants to look it up.  Those little machines can handle a lot until you turn on blending, and presumably pixel shader depth writes.  Once you do this they slow to a crawl.

That would explain how they got OpenGL ES to work on that hardware in the first place.

 

I know what the algorithm does, it was used in the Dreamcast too, it's basically like an extremely simplified version of raytracing more or less. The more obvious issue, as you can imagine, is that such a thing doesn't even need the depth buffer at all, it processes all triangles and sorts them together (which is also how the Dreamcast got sort-independent alpha blending). I imagine the main reason Sega went with it back then is that it allowed for many more triangles at a much smaller fillrate.

 

I don't know how different is the current TBDR compared to the one from back then, but I presume that to get the same performance gains it must cheat a lot to work with OpenGL ES.

Share this post


Link to post
Share on other sites
Well as this article (which I didn't initially read) suggests, some cards support this and they use the word "explicitly" which implies that there is an "implicit" case.
ATI and NVIDIA are going to be supporting this early optimization. I don't know anything that suggests otherwise and have read some internal docs.

This extension also provides the capability to explicitly enable "early"
    per-fragment tests, where operations like depth and stencil testing are
    performed prior to fragment shader execution.  In unextended OpenGL,
    fragment shaders never have any side effects and implementations can
    sometimes perform per-fragment tests and discard some fragments prior to
    executing the fragment shader.

I think this extension is worded in a way that may be somewhat misleading, though at least they've put "early" in quotes. The same goes for the (wiki, by the way, so caveat emptor) page on opengl.org that you've dug up.

 

There is no "forcing early z" in OpenGL. OpenGL does not have any such thing as an "early z" at all, so you cannot enforce it. The specification is very clear about when the z test happens, and it is not "early", it is after the fragment shader has run. Still, implementations are allowed to do something different as long as the observable result is exactly identical, and most modern implementations in fact do something different.

 

If you search the OpenGL specification for "early", you find 3 occurrences of "linearly" and two occurrences of "clearly" (because Adobe Reader has no notion of searching for whole words), but "early" has no appearance at all. In particular, the additions to chapter 3 in above extension spec are funny because for example section 3.12.2 does not even exist in my copy of the specification (it stops at 3.11). They must be using a different copy smile.png

 

A better wording would be that you can give a strong hint to the implementation which effectively forces early z test on implementations that do an early z test (or, on most mainstream implementations).

 

The thing is, a modern implementation would of course always like to do the z test early, because this saves shader work. Insofar there is no need to "force" it. It's trying hard to do it anyway. However, the implementation must still guarantee that the result is the same, which it can only do with some very harsh constraints (for example if the shader does not modify z, so it is already known what the value will be long before the shader runs).

 

Now, by using a qualifier that tells the implementation so-and-so, you give a promise (for example "depth will not change" for "depth will always be greater"), or in the second example that you've given, you ask for a specific behaviour.

 

By doing so, you give a promise to the implementation that you know what you're doing, and that you guarantee that whatever you do will not cause the results to be wrong if it performs the early z optimization. You can of course break your promise or do something that will not work with the behaviour that you request, but this is very unwise -- that'd be welcome to the land of undefined behaviour.

 

Taking your word on that promise, the implementation will of course do the optimization (that's pretty much guaranteeed). In a way, you could maybe interprete this as "force on", but it really isn't. It's more "enabling" or "allowing" the implementation to do something outside the specification.

Edited by samoth

Share this post


Link to post
Share on other sites
This feature exists to ensure proper behavior when using Image Load Store or other incoherent memory writing.

To me, the part where it says, 'when using' seems fairly explicit. 

"...this ensures that image load/store operations will only happen on fragments that pass the depth test."  

I would say from what is written in that document that GL_ARB_conservative_depth is a filter for  GL_ARB_shader_image_load_store.

 

It states that GL_ARB_conservative_depth is meant to prevent operations on  GL_ARB_shader_image_load_store from being executed when the fragment fails the depth test.

 

I don't see anything that says they can be interchangeable replacements or substitutes for one another.

Share this post


Link to post
Share on other sites

ic0de, I think the summary is openGL may or may not allow this, but the hardware for great graphics cards (AMD/NVIDIA ...Intel should as well), requires heavy optimization to make powerful cards. They will take care of it whether openGL does or not.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Similar Content

    • By xhcao
      Does sync be needed to read texture content after access texture image in compute shader?
      My simple code is as below,
      glUseProgram(program.get());
      glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
      glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
      glDispatchCompute(1, 1, 1);
      // Does sync be needed here?
      glUseProgram(0);
      glBindFramebuffer(GL_READ_FRAMEBUFFER, framebuffer);
      glFramebufferTexture2D(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                                     GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
      glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);
       
      Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?
    • By Jonathan2006
      My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:
      // This works quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); quat quaternion2 = GEMultiplyQuaternions(quaternion1, GEQuaternionFromAngleRadians(angleRadiansVector2)); quat quaternion3 = GEMultiplyQuaternions(quaternion2, GEQuaternionFromAngleRadians(angleRadiansVector3)); glMultMatrixf(GEMat4FromQuaternion(quaternion3).array); // The first two work fine but not the third. Why? quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); vec3 vector1 = GETransformQuaternionAndVector(quaternion1, angularVelocity1); quat quaternion2 = GEQuaternionFromAngleRadians(angleRadiansVector2); vec3 vector2 = GETransformQuaternionAndVector(quaternion2, angularVelocity2); // This doesn't work //quat quaternion3 = GEQuaternionFromAngleRadians(angleRadiansVector3); //vec3 vector3 = GETransformQuaternionAndVector(quaternion3, angularVelocity3); vec3 angleVelocity = GEAddVectors(vector1, vector2); // Does not work: vec3 angleVelocity = GEAddVectors(vector1, GEAddVectors(vector2, vector3)); static vec3 angleRadiansVector; vec3 angularAcceleration = GESetVector(0.0, 0.0, 0.0); // Sending it through one angular velocity later in my motion engine angleVelocity = GEAddVectors(angleVelocity, GEMultiplyVectorAndScalar(angularAcceleration, timeStep)); angleRadiansVector = GEAddVectors(angleRadiansVector, GEMultiplyVectorAndScalar(angleVelocity, timeStep)); glMultMatrixf(GEMat4FromEulerAngle(angleRadiansVector).array); Also how do I combine multiple angularAcceleration variables? Is there an easier way to transform the angular values?
    • By dpadam450
      I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.
       
      #define NUM_LIGHTS 2
      struct Light
      {
          vec3 position;
          vec3 diffuse;
          float attenuation;
      };
      uniform Light Lights[NUM_LIGHTS];
       
       
    • By pr033r
      Hello,
      I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article (http://natureofcode.com/book/chapter-6-autonomous-agents/) inspirate from another code (here: https://github.com/jyanar/Boids/tree/master/src) but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things. 
      I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help. 
      My code (optimalizing branch): https://github.com/pr033r/BachelorProject/tree/Optimalizing
      Exe file (if you want to look) and models folder (for those who will download the sources):
      http://leteckaposta.cz/367190436
      Thanks for any help...

    • By Andrija
      I am currently trying to implement shadow mapping into my project , but although i can render my depth map to the screen and it looks okay , when i sample it with shadowCoords there is no shadow.
      Here is my light space matrix calculation
      mat4x4 lightViewMatrix; vec3 sun_pos = {SUN_OFFSET * the_sun->direction[0], SUN_OFFSET * the_sun->direction[1], SUN_OFFSET * the_sun->direction[2]}; mat4x4_look_at(lightViewMatrix,sun_pos,player->pos,up); mat4x4_mul(lightSpaceMatrix,lightProjMatrix,lightViewMatrix); I will tweak the values for the size and frustum of the shadow map, but for now i just want to draw shadows around the player position
      the_sun->direction is a normalized vector so i multiply it by a constant to get the position.
      player->pos is the camera position in world space
      the light projection matrix is calculated like this:
      mat4x4_ortho(lightProjMatrix,-SHADOW_FAR,SHADOW_FAR,-SHADOW_FAR,SHADOW_FAR,NEAR,SHADOW_FAR); Shadow vertex shader:
      uniform mat4 light_space_matrix; void main() { gl_Position = light_space_matrix * transfMatrix * vec4(position, 1.0f); } Shadow fragment shader:
      out float fragDepth; void main() { fragDepth = gl_FragCoord.z; } I am using deferred rendering so i have all my world positions in the g_positions buffer
      My shadow calculation in the deferred fragment shader:
      float get_shadow_fac(vec4 light_space_pos) { vec3 shadow_coords = light_space_pos.xyz / light_space_pos.w; shadow_coords = shadow_coords * 0.5 + 0.5; float closest_depth = texture(shadow_map, shadow_coords.xy).r; float current_depth = shadow_coords.z; float shadow_fac = 1.0; if(closest_depth < current_depth) shadow_fac = 0.5; return shadow_fac; } I call the function like this:
      get_shadow_fac(light_space_matrix * vec4(position,1.0)); Where position is the value i got from sampling the g_position buffer
      Here is my depth texture (i know it will produce low quality shadows but i just want to get it working for now):
      sorry because of the compression , the black smudges are trees ... https://i.stack.imgur.com/T43aK.jpg
      EDIT: Depth texture attachment:
      glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT24,fbo->width,fbo->height,0,GL_DEPTH_COMPONENT,GL_FLOAT,NULL); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, fbo->depthTexture, 0);
  • Popular Now