Reconstructing pixel 3D position from depth

Started by
72 comments, last by MJP 16 years, 4 months ago
First off, sorry for my rambling posts and for taking some time to reply. I've been traveling over the weekend, and my access to the internet was severely limited.

Back to the topic...as for how the various coordinate spaces relate to traditional OpenGL matrix types, I had to go look it up myself since I'm not really familiar with GL. Section 9.011 of the OpenGL FAQ seems to do a good job of explaining it all. According to that, what I call the "worldView" matrix is the same as your "modelView" matrix. In Direct3D, 3 matrices are used for projection instead of 2 (world, view, and projection respectively), whereas in GL the first two parts are combined into one matrix. The process goes something like this:

Object Coordinates are transformed by the World matrix to produce World Coordinates.World Coordinates are transformed by the View matrix to produce Eye (View-space) Coordinates.Eye Coordinates are transformed by the Projection matrix to produce Clip Coordinates.Clip Coordinate X, Y, and Z are divided by Clip Coordinate W to produce Normalized Device Coordinates.


Now for the first solution to your problem, getting world-space coordinates from a buffer filled with z/w values, you need a matrix that does the reverse of those last two steps (the inverse viewProjection matrix). But as you've indicated, you don't have a viewProjection matrix to start off with. This means you'll have to create it and invert it in your application, and then pass the matrix as a shader constant. Doing this isn't too hard, since you already have the projection matrix. You just need a view matrix as well. In direct3d I use a helper function for creating a view matrix, but if you don't have access to such a function in GL it's not a problem creating one. Just think about what a view matrix does: it takes coordinates that are in world space, then transforms them so that they are now coordinates relative to your camera's position and orientation. This means that if your camera is located at <0,10,0>, you must translate your original coordinate by <0,-10,0>. If the camera is rotated 90 degrees about the y-axis, the coordinate must be rotated -90 degrees about the same axis. So in other words, you must come up with a transformation matrix for your camera and then invert it. Then you can multiply this with your projection matrix, invert the product, and voila: an inverse viewProjection matrix.

[Edited by - MJP on December 2, 2007 6:10:29 PM]
Advertisement
Now as for the second solution to your problem...I fear I've mislead you a bit by talking too much about my own specific implementation details. For example the method I was describing produces eye-space coordinates rather than world-space coordinates, and certain portions would have to be modified for calculating world-space coordinates (specifically: how I calculate the frustum corners only works for calculating eye-space coordinates of the corners, or world-sace if your camera is not rotated about the z-axis). I also do some things the way I do because I use light volumes and not full-screen quads (viewDirection can be calculated in the app rather than in the shaders with quads).

So I think I'll just start over and explain an algorithm that you can use for generating world-space coordinates, using full screen quads. Then perhaps I will post some example code, or discuss some specific optimizations.

Step 1:  create normalized eye-space depth bufferIn vertex shader:-calculate eye-space position of vertex (transform by worldView matrix for D3D, modelView matrix for GL)-pass eye-space position to fragment shaderIn fragment shader:-divide z-component of eye-space position by the z-value of the view frustum's far clip plane-Output the value to the depth bufferStep 2:  calculate world-space position from depth bufferIn application:-calculate world-space positions of the 4 corners of the view frustum's far clip plane-pass points to vertex shader as constants or as values in the vertices of the full-screen quad (the 4 corner points should be mapped to the 4 points of the quad)-pass the world-space position of the camera to the fragment shader-render the quadIn vertex shader:-retrieve the position of the the frustum corner for the current vertex-pass as "viewDirection" to the fragment shader (do not normalize!)In fragment shader:-do not normalize "viewDirection"!-read depth value from the depth buffer-world-space position of the fragment is "cameraPos + (viewDirection * depth)"


[Edited by - MJP on April 9, 2008 2:33:30 PM]
Quote:Original post by spek
Thanks again for the notes on matrix multiplications. I know the coordinates are just "values", but the matrice usage often confuses me. I should print this explenation along with MJP's text as well. One of the problems is probably that I use the wrong matrices. In this case, I don't know how to get the "inverse view perspective" matrix. I use Cg for the shaders, and in combination with OpenGL it offers to pass the following matrices:
- ModelView Matrix
- ModelViewProjection Matrix
- Texture Matrix
- Projection Matrix
In combination with identity/inverse/transpose/inverse transpose.

But I guess I have to construct this inverse view perspective matrix myself, just like you could modify the texture matrix for projective texturing, true? Or is it listed above, but with another name? I'm not very familiar with the technical "jargon". I tried the inverse modelView and modelViewProjection, but that didn't give the right results, probably that is something different..,

That's my fault, you'll want to use the combination CG_GL_MODELVIEW_PROJECTION_MATRIX/CG_GL_MATRIX_INVERSE. I should have mentioned before that the view matrix in Direct3D is the modelview matrix in OpenGL. The major difference between the two is that the OpenGL modelview matrix is also responsible for transforming from model-space to world-space, whereas Direct3D has a separate world matrix for that. This means you'll need to make sure that the modelview transformation stack contains only the view transformation (i.e. something generated by glLookAt), or else the results you get from the shaders will be in model space. It shouldn't be a problem if you're properly using the stack, but I'm just putting it out there.
No need to apoligize, you guys always help me here!

I'm getting close... I followed MJP's way of doing it. Now the results (3d positions) seems to be almost correct. I'm still using a 16 bit texture, so there is probably some inaccuresy in the depth.

But the real problem is still with the matrices, I think. When I start moving the camera (I start on ~0,0,0), the results quickly turn wrong. As you might have noticed, I'm bad with matrices. But I suppose this is because the camera position is used in the OpenGL ModelViewProjection Matrix (is that the difference with the D3D View matrix)? So far I haven't done anything with the stack like you guys warned me about, so probably its going wrong there.

// Application... Calculate farplane coordinates, I pass later on them as the normals of that quad// Before rendering the depth and quad, pass the camera matrix// Depth Vertex Shader    // in.pos is an absolute world coordinate    out.Pos	= mul( modelViewProjectionMatrix, in.pos );// Depth Fragment Shader    out.color.r	= out.pos.z / 500; // test 500 is max(test) view distance// Test Quad Vertex Shader    // Just pass. Quad XY Coords are (-1,-1) (-1,+1) (+1,+1) and (+1,-1)    // Already in eye-space... right?    out.pos.xy	= iPos.xy;     // Farplane coordinate (mapped on quad) is placed in the normal iNrm    // MVPI = Inverse ModelViewProjection Matrix, Cg OpenGL:    //    CG_GL_MODELVIEWPROJETION_MATRIX, CG_GL_INVERSE_MATRIX    float4 worldPos = mul( MVPI, in.pos );         // The farplane world coordinates are stored inside the 4 normals    out.viewDir.xy = (worldPos.xy / worldPos.w) * normal.xy;    out.viewDir.z  = normal.z;// Test Quad Fragment Shader    float	pDepth	 = f1tex2D( sceneDepth, iTex.xy ).r;      float3	pPos3D	 = cameraPos.xyz + in.viewDir.xyz * pDepth;                out.color= pPos3D; // test the result

I think the inverse modelviewmatrix is not in the right state when I pass it to the shaders.

Some other small questions:
- @MJP, the "farplane.z", is that the distance between the camera and plane
(maximum view distance?)?
- @Zipster, when using your method, howto calculate the depth? In the same way
as in MJP's method (out.pos.z / farplane.z) or different
(out.pos.z / out.pos.w ...)

[another edit]
I get the "same" results with Zipsters implementation (but with the depth changed to z/w instead of z/farplane.z). I also use the inverse modelviewprojection matrix in here ("invProj"):
float4 perspective_position = float4(iPos.x, iPos.y, pDepth * iPos.w, iPos.w);perspective_position);float4 world_position_4d = mul(InvProj, perspective_position);float3 world_position_3d = world_position_4d.xyz / world_position_4d.w;

Yet again, when I start moving away from 0,0,0 with the camera, the results get wrong.


Thanks for the detailed answers! Maybe I understand those matrices someday :)
Rick

[Edited by - spek on December 3, 2007 1:10:06 PM]
Hey guys!

Im currently working on a deferred shading engine. Its up and running in its basic form, and have to calculate the view-space coordinates of each fragment in subsequent lighting passes from depth also.

I acknowledge the fact that the approach I would use for the secondary pass is different (i literally draw a bounding volume with the current projection matrix as it were still the geometry pass) whereas you guys are drawing a fullscreen quad in orthogonal projection (?) to do further processing.

I use opengl, but im sure dx could use similar methodology since we are all using the same hardware.

I set up my frame buffer object with the repective colour buffers (standard 32bit RGBA) along with the standard depth and stencil buffer (24bit + 8bit) bound as a texture (as apposed to a straight forward non-texture render target). Latter lighting passes use that generic 24bit depth buffer read as a texture to derive screen space location per fragment.

Obviously, that depth buffer doesn't store the value linearly to give higher precision closer to the viewer and lesser precision out into the distance. So this needs to be converted during any post passes. The equation for that is quite simple and can be found here. Under the heading "The Resolution of Z", 'a' and 'b' can be pre calculated and handed to the frag shader to simplify that calculation.

During the lighting pass i draw my bounding volume in perspective projection space as i mentioned above. Within the vertex shader i send a varying variable to the frag shader containing the view space location of the "bounding volumes" fragment (gl_ModelviewMatrix * gl_Vertex). In the frag shader, i now have enough information to derive view space location of the actual fragment to be lit.

Since the bounding volumes fragment location lies on the same ray cast from the viewer (0,0,0) through the fragment to be lit, you can derive this location quite easily ( 2 known points and a third point with one known value, Z).

I realize you guys aren't drawing a bounding volume like me, but is there any reason why you couldn't draw "something" (a quad) to cover the entire view and derive everything from that? Seems likely to me. You could still calculate world location by using the inverse modelview matrix. Since the depth buffer uses a 24bit non-linear range, precision should be more than sufficient.

Anyway, I could be way off track. Good luck with it all!
Quote:Original post by MJP

So I think I'll just start over and explain an algorithm that you can use for generating world-space coordinates, using full screen quads. Then perhaps I will post some example code, or discuss some specific optimizations.


MJP: Thank you very much! This explanation is all I needed; now it works perfectly; fast with perfect quality.. its great to eliminate the need for an entire render target and just use depth. Now i just use 2 render targets for my deferred renderer... Thanks again.

@Hibread
The full-screen quad has not much to do with deferred shading in this case, I'm trying to implement Screen Space Ambient Occlusion. You calculate the depth differences over the entire screen, so that's why. Nevertheless, I'm also doing deferred shading so there probably will be a point that I need to calculate the world positions for light volumes as well (so far I'm writing x,y and z in a texture, but that's a waste of space of course).

MJP's implementation seems to be the fast, since there only a few basic instructions are needed in the fragment shader. Zipster showed another basic way to do it, which I'll try to use to test the results now.

Anyway, thanks you too for the tips!
Rick
Quote:Original post by Matt Aufderheide
Quote:Original post by MJP

So I think I'll just start over and explain an algorithm that you can use for generating world-space coordinates, using full screen quads. Then perhaps I will post some example code, or discuss some specific optimizations.


MJP: Thank you very much! This explanation is all I needed; now it works perfectly; fast with perfect quality.. its great to eliminate the need for an entire render target and just use depth. Now i just use 2 render targets for my deferred renderer... Thanks again.


You're very welcome! It's something I've also found to be very useful for deferred renderers, and having spent a good deal of time myself figuring it out I'm always willing to attempt to make it easier for anyone else.
Spek I think I see your problem. When rendering your depth buffer for the "MJP method" (modest, aren't I?), you need to calculate the position in eye space before dividing by farplane.z (yes, this is the distance from the camera to your far frustum plane). What you seem do be doing is calculating its clip space position, and then using that. At least, this is based on my assumption that you mean in.pos is in object space, rather than world space. If it were already in world space, multiplying by the modelViewProjection matrix would add an extra transform and everything would be coming out all wrong.

Since what you want is eye space, you should be doing this:

// Depth Vertex Shader    // in.pos is in object space     out.pos = mul( modelViewProjectionMatrix, in.pos );  //out.pos is in clip space    out.eyePos = mul( modelViewMatrix, in.pos );  //out.eyePos is in eye space    // Depth Fragment Shader    out.color.r	= out.eyePos.z / 500; // test 500 is max(test) view distance


Now for the Zipster method, what you want is the z component of the position in normalized device coordinates. If you remember the GL faq, you get normalized device coordinates by taking the point in clip space (IE, multiplied by modelViewProjection) and dividing by the w component. So your shaders would look like this:

// Depth Vertex Shader    // in.pos is in object space     out.pos = mul( modelViewProjectionMatrix, in.pos );  //out.pos is in clip space    // Depth Fragment Shader    out.color.r	= out.pos.z / out.pos.w;
Quote:Original post by spek

out.color.r = out.pos.z / 500; // test 500 is max(test) view distance


// Test Quad Vertex Shader
// Just pass. Quad XY Coords are (-1,-1) (-1,+1) (+1,+1) and (+1,-1)
// Already in eye-space... right?
out.pos.xy = iPos.xy;

// Farplane coordinate (mapped on quad) is placed in the normal iNrm
// MVPI = Inverse ModelViewProjection Matrix, Cg OpenGL:
// CG_GL_MODELVIEWPROJETION_MATRIX, CG_GL_INVERSE_MATRIX
float4 worldPos = mul( MVPI, in.pos );

// The farplane world coordinates are stored inside the 4 normals
out.viewDir.xy = (worldPos.xy / worldPos.w) * normal.xy;
out.viewDir.z = normal.z;

// Test Quad Fragment Shader
float pDepth = f1tex2D( sceneDepth, iTex.xy ).r;
float3 pPos3D = cameraPos.xyz + in.viewDir.xyz * pDepth;
out.color= pPos3D; // test the result



Okay now for the second part: actually using your depth buffer to derive the world-space position. You seem to be combining parts of my method with parts of zipster's method, with a dash of things I was confusing you with previously. Are you storing all four corners of the frustum seperately with one corner for each quad vertex, or are you just storing the position of the upper right corner? If you're storing all four corners, then things should be very simple for you:

// Test Quad Vertex Shader    // Just pass. Quad XY Coords are (-1,-1) (-1,+1) (+1,+1) and (+1,-1)    // (these are actually in normalized device coordinates)    out.pos.xy	= iPos.xy;          // The farplane world coordinates are stored inside the 4 normals    out.viewDir = normal;   // Test Quad Fragment Shader    float	pDepth	 = f1tex2D( sceneDepth, iTex.xy ).r;      float3	pPos3D	 = cameraPos.xyz + in.viewDir.xyz * pDepth;                out.color= pPos3D; // test the result


Now for Zipster's method, you're probably just not setting up your matrices right. You would need to do this:

Application:-Create a view matrix using gluLookAt-Set this view matrix as your modelView matrix in the stack-use the same projection matrix you've been using all along//vertex shader // Just pass. Quad XY Coords are (-1,-1) (-1,+1) (+1,+1) and (+1,-1)    out.pos.xy	= iPos.xy;//fragment shaderfloat	pDepth	 = f1tex2D( sceneDepth, iTex.xy ).r; float4 perspective_position = float4(in.pos.x, in.pos.y, pDepth * in.pos.w, in.pos.w);// MVPI = Inverse ModelViewProjection Matrix, Cg OpenGL://    CG_GL_MODELVIEWPROJETION_MATRIX, CG_GL_INVERSE_MATRIXfloat4 world_position_4d = mul(perspective_position, MVPI);float3 world_position_3d = world_position_4d.xyz / world_position_4d.w;



This topic is closed to new replies.

Advertisement