• Advertisement
Sign in to follow this  

Z-buffer - why not pass the z value separately

This topic is 4726 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi I have had this question for quite some time, and maybe some one can answer. As I understand, the z-bufefr precision problems occur because the value that gets written in the z-buffer is not z, but instead it is 1/z. This happens because after projection, z coordinate is reduced to zero, and it gets encoded in w. Therefore the projection stage is followed by a perspective divide stage, where the projected vertex position values are divided by the 4th coordiante, w, thus giving the correct x, and y values and z coordiante becoming 1/z.
world          projected          after perspective divide
[x y z 1] ---> [x y 1 z] -------> [x/z  y/z 1/z 1]
So if 1/z is problematic (is it not?), then why can't the un-projected z be retained as is and forwarded directly to the z-buffer for depth comparisons? Thanks

Share this post


Link to post
Share on other sites
Advertisement
What you're describing is called w-buffering and it used to be supported by some graphics cards and exposed through DirectX. I think DirectX might actually still expose it but most hardware doesn't support it anymore. The reason z-buffering is more common is that it's better for most scenes than w-buffering. You generally want more precision for objects closer to the camera which is what z-buffering using a value proportional to 1/z gives you.

Share this post


Link to post
Share on other sites
And in addition, if I don't recall incorrectly 1/z is actually linear when calculating a tringle scan line which makes the calculations a lot cheaper because you can use linear interpolation.

Share this post


Link to post
Share on other sites
Are you sure current hardwares don't support true z-buffering (what you called w-buffering) anymore ?

Think of it : a large ads screen on a large wall. This can be viewed from far away. If you fall under the precision threshold, the ads screen will be masked by portions of the wall : rendering artefacts. The whole scene is packed into a huge polygonal soup so that the coder can't easilly hack this rendering issue (stencil ?).

Of course 32 bits of precision are now standard and push this kind of z-buffer precision issues to "marginal" cases. Still take these realistic ranges :

Ex (a 32bits integer "z" buffer, writing k/z) :

metrics normalized 1/z for rendering.

znear = 2^-6 (2cm) <=> k/znear (-1) = 2^32 (-1)
zfar/2 = 2^13 (2kms) <=> 2k/zfar (-1) = 2^13 (-1)
zfar = 2^14 (4kms) <=> k/zfar (-1) = 2^12 (-1)


Thus between zfar/2 and zfar (2kms = 2^11 meters), k/z spans over a range of 2^12 (2^13-2^12) consecutive integers. Thus the zbuffer has a "local" precision of 12 bits in these far ranges. Which gives a metric precision of 2^11/2^12 = 0.5 meters !!! Not very accurate. This problem is solved if the zbuffer uses floating points.

As far as I can remember for large city scenes, w buffering (absolute metric precision) was preferable, since the constraints were not with CLOD (near details) but metric ranges in world space that can't allow a degradation of the z-buffer far-away.

To sum up, 3 solutions :
- w-buffering (<=> world space z)
- floating point z-buffering (1/z).
- render the scene in multiple passes with modified view frustum parameters (near, far) each time.

Share this post


Link to post
Share on other sites
Quote:
Original post by Charles B
Are you sure current hardwares don't support true z-buffering (what you called w-buffering) anymore ?

My GeForce 6800 doesn't support w-buffering according to the DirectX Caps Viewer: D3DPRASTERCAPS_WBUFFER No. Don't have any current ATI hardware to hand to check but I don't think any recent Radeons support it either.

Share this post


Link to post
Share on other sites
I think I have found the answer to my question!

Using unprojected z would be plain wrong. Reasons:

We only have the unprojected z value available to us for vertices of the polygon only. When we scan convert a polygon, we NEVER have the true 'z' value for each and every pixel. We only have 1/z (or some other related value) for each pixel.

Hence during depth comparisons for every pixel, while we can do un-projected z coords at the vertices, the only correct value that we can use for each pixel is infact 1/z.

So unless the hardware does an invert, and then compares, I think using plain unprojected-z would be mathematically wrong - or in fact, it is not even readily available for each pixel without performing additional mathemtmathemtical operations on the interpolated values.

Perhaps that explains why we have to have 1/z values in the z-buffers.

Can some confirm my understanding.


Also, why would the cards not use 32 bit floating point (IEEE) z-buffers. From the DXSDK, there is only 1 such format avbaialble and my Raedon 9500 pro does not support it (from Caps viewer)

Also,

Share this post


Link to post
Share on other sites
Quote:

Original post by AQ
Using unprojected z would be plain wrong. Reasons:
...

Can some confirm my understanding.

No in fact, when perspective correction is on, both 1/z and z are available in in the rendering pipeline per fragment (scanline rendering usually).

Take a look at the Quake1 sources to get an idea of perspective correct scanline rendering. Note how the latency of the division was masked by 8 or 16 pixel fillers. Yes it was not per pixel (per 8/16 pixels and then lerped) but that was software rendering, 10 years ago.

To explain it shortly, I'll use weird but simple notations :

For any coordinate, X (major) denotes world space, and x (minor) denotes screen space (after projection) : x=X/Z, y=Y/Z, etc...

Due to well known (and easy to prove) world space to projected space dualities :
- z (=1/Z) is linear with the screen space coordinates (x and y).
- the divided texture coords s(=S/Z) and t(=T/Z) are also proportional to x and y.
- the "perspective correct" texture coordinates S(=s*1/z), T(=t*1/z) are proportional to the world coords X,Y,Z. Thus perspective correction requires to reinverse : Z=1/z.

Quote:

... floating points ...

In my opinion if "w-buffering" has been disabled it's only to hardwire (1/Z)-buffering more, spare some precious transistors. Reducing generality is often a source of optimization. I suppose it's linked to the latest algos of multi-texturing + trilinear filtering that must be perfectly pipelined and scheduled. The less cases to handle, the more chances to write ultra specialized, perfectly parrallelized and optimized texturing algos on the chips.

Also I suspect that if floating point z-buffers are not often supported, it's because integers (or fixed point) are ultimately efficient for linear interpolations. In screen space, from one pixel to the next, most parameters are simply updated by adding constant increments. And when some params are constant, they usually open to further enormous lower level savings. Incrementing many integers at once is probably the source of the enormous pixel rates the GPUs have these days. Additions of floating points cost far more in elementary (bitwise) operations.

Share this post


Link to post
Share on other sites
The reason w-buffering isn't supported on modern cards is that if z is linear in screen space, you can make lots of optimisations. It enables efficient compression of z which is very important for efficient multi sampling anti aliasing for example. It also saves per sample transistors since you save a divide that would otherwise be required. And you want to be able to output lots of z-stencil samples to get good framerates with AA and/or stencil shadows and/or high res shadow maps.

There's a more extensive discussion on this at opengl.org under the "Suggestions for OpenGL 2.?", if you search for "z-buffer formula" you'll probably find it.

Share this post


Link to post
Share on other sites
Here is another question in the problem.


Specifically, I understand your explanation that z can not be used as what we require is the distance of a vertex to the eye. So we shoudl compute the full distance instead of using the original z value. that is understood. However what I do not understand is how this i/z can be used in place of the proper distance (using the distance formula).

2. On a side note, I think after projection, we are NOT comparing the actual distances. Because if we were, then consider this scenario. You are in eye space and there is a point (vertex) on z axis infront of you at 5 z units. If you rotate this point to the right now, it is still 5 units away from the eye, however it's z coordiante is now less than 5. Now two questions. What value do we use for z-comparison (in other words, what does the hardware do). Does it use the absolute distance (which would be the same in both cases), or does it use the z value only (in which case the point after rotation is closer).

I think what the hardware does is is it computes the z-distance from the projection plane and NOT to the eye!





\ | /
\ | * / B
C \ |** / CA
\__|__/
\ | /
\|/
.



I have tried to highlight this in this picture. The two stars are on the same projection line and hence will both project to the same point on the projection plane (== near plane). But there are two z- values (distance) values we can use for them. Their actual distances (computed using the ditance formula) or their jsut the z-distances (which eill appear as perpendicular lines from near plane to the point).

If we choose the actual distances, then I can not understand how come 1/z thing can be used in its place. If we use the vertical distance, then why can we not just use the original z value.

Furthermore, vertex C is at same z depth as far as eye is concerend. However its correct distance from the eye is less than that for vertex A. Hence I think we must not use CORRECt geometric distance and instead realize that eye perceives the distances as they are measured from the near plane. (or plane of the eye for that metter).

Share this post


Link to post
Share on other sites
This is for Charles B

you mentioned

z (=1/Z)

but I do not get it ..

I understand x = X/Z and y = Y/Z

but z = D .. always ....

but since this value is literally useless for us as far as depth comparisons are concerned, what we can do is assign it 1/z instead. I beleive it has no other mathematical relation. We can either assign the original z or assign 1/z (which gets computed automatically during projection) to carry it forward in the pipeline for depth comparisons.

And therefore my question was .. why do we have to carry 1/z when we can perfectly well use z to do comparison. My understanding so far is that when scan converting the triangle, we would need a z for each fragment of the triangle and we can compute 1/z for each pixel thru a simple linear interpolation where as comuting z for each pixel would require interpolation in 3D space!

please any comments!

Share this post


Link to post
Share on other sites
Well calculating 1/z in screen space is as you say cheaper. There's your answer. The point you might be missing is that you can use 1/z for depth comparisons with as much success as using z, because if 1/z1 > 1/z2 then z1 < z2, given usual restrictions on the domain of z values.

Hope that clears everything.

Share this post


Link to post
Share on other sites
Quote:
Original post by AQ
This is for Charles B

you mentioned

z (=1/Z)

but I do not get it ..

I understand x = X/Z and y = Y/Z

but z = D .. always ....


OK, it's maybe the fault of my notation. I should have called this variable w instead of z to be coherent with the other definitions (like x=X/Z).

If the vert in camera space is : [X, Y, Z, 1], W=1 denotes a point.

Forget the object to camera transfo to simplify here, just focus on the intermediate computation from camera to perspective. Normally the matrices are all concatenated in a global object to projection matrix (transfo).

Also to simplify forget about the mapping that makes the frustum a unit cube after through the projection and division. Normally it's concantenated in the projection matrix before the division. But it could also be done after the division, before the rasterization. It's all a matter of translations and uniform scaling anyway, so it does not change the linear dependencies we talked about concerning x=X*D/W, y=Y*D/W and D/W.


Then when you multiply by the projection matrix, you principally compute the W component, that is the distance from the near plane. D/W gives the projection ratio. I'll assume D=1 to simplify the notations.

{X, Y, Z, W(usually//Z)}, W is the distance from your vertex to the eye in the perp(near plane) direction. (It's not always // to Z). Note that since I let the frustum issue apart to simplify, X, Y, Z are unchanged.

A division on the four components would give the point
{ x=X/W, y=Y/W, z=Z/W, w=W/W=1 }
This gives the material point that intersects the near plane on the line from the original vertex to the eye. w=D would serve to nothing for the rasterization process.


But then, with this division by W :
{ x=X/W, y=Y/W, z=Z/W, w=1/W }

Possibly, in terms of low level code W is quadrupled {W, W, W, W} in a temp register, and replaced by 1 in the source vertex, then a term by term division by 4 components is done.

Both the original and final verts represent the same material point in 3D, after the normalization operator is applied (here divide the last point by w, that is remultiply by W, let w become 1 again). So, this point contains all the relevant info you need at the rasterization stage. For instance this could let the shader reobtain positions in world space only from these "projected" points.

You can use w as an inverse "depth", for the "z"-buffer, since comparisons wa<wb or Za>Zb are equivalent. Also w is linear in x and y. You can use w+=dw_dx; between consecutive fragments on a span, thus speedy depth tests.

And you can do perspective correct texturing by using s=S/W=S*w and t=T/W= , which exactly interpolate linearilly on the screen (s+=ds_dx; and t+=dt_dx; are also very practical). To get the texels (inverse mapping), you reobtain S and T by dividing s and t by w ( <=> multiplying by W ), which requires a reciprociqual W=1/w per fragment. Thus my claim that anyway W (linear to Z, thus roughly equivalent) could be used for world space depth tests, since it's required for perspective correct texturing.

Quote:

My understanding so far is that when scan converting the triangle, we would need a z for each fragment of the triangle and we can compute 1/z for each pixel thru a simple linear interpolation where as comuting z for each pixel would require interpolation in 3D space !

Nearly that. But it's not about scan conversion, rather inverse mapping (from screen to world or texture or color field, etc...). It's about linear interpolation with constant increments in screen space.. You can never do any inverse mapping, by incrementing in 3D space. The steps are not constant in world space. It's not a problem of dimensions, 2D or 3D. It's a problem of projection and division. Screen space is not a linear subspace of the original 3D space. Linear interpolation (lerp) of w(=1/W) in screen space is exact. But lerp of W is not. Hence W, that is 1/w (a reciprocical) is in theory required per fragment for perspective correct inverse mapping of values that are linear in world space (ex: classical S,T, texture coordinates) but not linear in screen space. Reciprocicals also don't cost much these days.

Still, the problem is optimizations remove computations of W per fragment. Instead W=1/w is lerped and thus approximated between near consecutive fragments. I even think it's lerped in small blocks of 8x8 (or more ?) pixels. The processors use matrices of pixels for better paralellization, filtering, etc...

Today I am ultra optimizing a software HOM (Hierachical Occlusion Map) module with SIMD technologies (SSE). I can achieve tremendous fill and request rates, that trash hardware based HOM (+ other high level optimizations that could not be done with hw) by treating several columns and spans at once. Perfect for prefetching, loop overhead reduction, massive //ism. This reinforces the analsis made by Yann L some years ago. CPUs also get better and better. So the balance point will not easilly be in favor of HW concerining oclusion.

Optimizing through blocks and lerp, that's definitely why I suppose W testing is not done. The lerped values of W would cause graphical artefacts since the triangle planes would be distorted. (Imagine a frame on a wall). Thus 1/w only for depth testing.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement