Reconstructing pixel 3D position from depth

Started by
72 comments, last by MJP 16 years, 4 months ago
Hi, I saw its not the first time this has been asked here, but with the information from that posts I didn't succeed either. I'm playing with SSAO (with the help of this website: http://rgba.scenesp.org/iq/computer/articles/ssao/ssao.htm ). One of the things I need to do, is reconstructing the 3D world position for each pixel. Well, first of all, why don't they just store the 3D position in a map(they are only using 1 channel from those RGBA textures in this case), instead of reconstructing it with the help of the depth? Or is not exactly the same? I might be confused with terms as "clip space", "world space", "eye space", and so on. Anyway, this what I did. First I render the whole scene with to a texture. I store the depth by doing: pos = mul( in.pos, modelViewProjectionMatrix ); ... out.color.r = pos.z / pos.w; In a second pass, I draw a quad that fills the screen. Each pixel should reconstruct the 3D position, but here it goes wrong I think. I tried 2 ways:

// 1.
pos.xy = in.vertexPos.xy; // in vertex shader
...
float   depth      = tex2D( sceneDepthMap, texcoords.xy ).r;
float4	pPos3D     = mul( float4(pos.x, pos.y, depth, 1.0), InvProj );
	pPos3D.xyz = pPos3D.xyz / pPos3D.www;
	pPos3D.w   = 1.0f;
Maybe I'm using the wrong matrix for "invProj". If I understand it well, its the "inverse projection matrix" (I'm using OpenGL). I tried other matrices as well though (inverse modelview projection matrix). The other way:

// 2.
pos.xy = in.vertexPos.xy; // in vertex shader
viewVector = pos.xyz - camPos.xyz;
...
viewDir = normalize(viewDir);
float   depth   = tex2D( sceneDepthMap, texcoords.xy ).r;
float3  pPos3D	= cameraPos.xyz + viewVector.xyz * pDepth;
I suppose 'viewVector' is not correct here... Both ways give wrong results. If I compare it with the real 3D position as a color, its just totally different. My lacking knowledge about matrices and "spaces" is probably causing the problem... Anyone an idea what goes wrong? Greetings, Rick
Advertisement
Well for your first bit about storing depth...the reason why its not always desirable to store position is because it can require more bandwidth and memory space to store and retrieve the data. Storing position requires at least a 64-bpp floating-point surface, which can also cause you to revert to 64-bpp surfaces for all of your g-buffer textures on hardware that requires uniform bpp with multiple render targets. Precision problems can also be encountered with 16-bits per component, especially when storing world-space position.

What I do, is I store linear eye-space depth instead of z/w. To get this I multiply the vertex position by the worldView matrix, and then store the .z component in the pixel shader (divided by the z-value of the camera frustum's far clip plane). Then in my second pass, which is a full screen pass, I store a "camera direction vector", which is a vector pointing to the far corners of the camera frustum. You can either pass this as part of the vertex, or calculate this for each vertex in the vertex shader (I calculate it). Then all you need to do is pass this to your pixel shader, and your world-space position for the pixel is this:

worldPos = cameraPos + pixelDepth * screenDir;


This is pretty nice, especially since its only one MADD instruction. What I actually do is perform all my calculations in view space, which makes determining the frustum corners much easier. Regardless, I think its a more elegant solution than using z/w.

EDIT: Your code for calculating the "viewVector" in the second bit of code seems to be wrong. Is in.vertexPos.xy the position in object space or in world space? Also, your depth would have to be un-normalized eye-space depth for that to work (not z/w), and I'm not sure if you were doing that.
>> the reason why its not always desirable to store position is because it can require more bandwidth and memory space to store and retrieve the data
Sounds logical. But as far as I know, you can only use a 16/32F format with 4 channels anyway (in OpenGL, not sure about Direct3D). Storing only the depth could be very usefull for deferred shading since there is a lot of other data to store as well. But in the examples I saw, the RGBA texture was only used for storing the depth. So I though maybe I was doing something wrong, maybe the pixel position is not converted to "world space", but another space... Or something.


As for the depth z/w part, let's see if I'm doing it exactly right:
// vertex shaderout.pos = mul( modelViewProjection, in.vertexPos );// Pixel shaderout.depth = out.pos.z / out.pos.w;

If I look at the result of this (stored as a texture), it seems to be ok. Although I can't see if the depth is 100% correct of course.


As for the second part, at least my "3dPos = cameraPos + viewVec * depth" was correct :).
>> Is in.vertexPos.xy the position in object space or in world space?
Normally I would multiply the in.vertexPos with the ModelViewProjection indeed. But in this case, it isn't necesary. I render a quad with corner coordinates (-1,-1 .. +1,+1), since it needs to stay in front of the camera ('HUD quad'). The vertex shader just passes these coordinates. I don't know in which space they are then, "Projection Space"?. But...... maybe this won't work for calculating the viewVector.

And how to calculate the farplane corners? Sorry for these dumb questions, but all that matrix and space stuff is really difficult to me. Talking about it, what exactly is clip-space, screen-space and view-space? I understand world and object space, but these others are confusing. You get that when a point is transformed into the view frustum or something (where 0,0,0 would be the camera position?) ?

Thanks for helping MJP,
Rick
I am struggling with this too.. its seems i am close but not there yet... I wish soeoe could post a complete sahder instead of this littel fragments, as well as some pictures that maybe show the interpolants as renderer so we can test to see if each stage is working...

On of my main problems is that method of drawing a fullscreen quad uses pretransformed vertices, so I can us ea vertex sahder for that pass...otherwise I ahve troiuble mapping the pixels to textels perfectly(the screensized textures get a bit filtered otehrwise).

Does anyway have a good method of drawing a screen aligned quad that maps the texture perfectly to the screen pixel, and uses a vertex shader?
Quote:Original post by spek
>> the reason why its not always desirable to store position is because it can require more bandwidth and memory space to store and retrieve the data
Sounds logical. But as far as I know, you can only use a 16/32F format with 4 channels anyway (in OpenGL, not sure about Direct3D). Storing only the depth could be very usefull for deferred shading since there is a lot of other data to store as well. But in the examples I saw, the RGBA texture was only used for storing the depth. So I though maybe I was doing something wrong, maybe the pixel position is not converted to "world space", but another space... Or something.


I have no idea which formats are supported in GL, in D3D I use a very convenient single-channel 32-bit floating-point format for storing my depth. As for storing depth as RGBA8, I remember that fabio policarpo's deferred shading tutorial does this and uses a pair of functions for encoding and decoding a single floating-point value in RGBA8 format. I'd imagine this should work in a pinch, at the cost of some shader math.

Quote:Original post by spek
As for the depth z/w part, let's see if I'm doing it exactly right:
// vertex shaderout.pos = mul( modelViewProjection, in.vertexPos );// Pixel shaderout.depth = out.pos.z / out.pos.w;

If I look at the result of this (stored as a texture), it seems to be ok. Although I can't see if the depth is 100% correct of course.


That is right for storing depth as z/w. It should look "right", as this is precisely what gets stored in the z-buffer. But in this case its not really so much a case of "right", and more a case of "different". The difference between linearized eye-space depth (which you need to use for reconstruction position from the frustum corners) and z/w is that z/w is scaled between the far plane and the near plane of your projection frustum. So for example if your projection frustum had a minimum depth of 1.0 and a max depth of 1000.0, a z/w value of 0.0 would correspond to a view-space depth of 1.0 and a z/w value of 1.0 would correspond to a view-space depth 1000.0f. 0.5 would correspond to 500.5, and so on. Now for what I use, for calculating position from the frustum coordinates, the depth is normalized to a range between 0.0 and the depth of the far frustum plane. So in our example from before, a depth value of 0.0 would correspond to a view-space depth of 0.0, while 1.0 -> 1000.0 and 0.5->500.0. The difference is subtle, but important. My code for calculating this depth value looks something like this:

//vertex shaderOUT.viewSpacePos = mul(IN.position, worldViewMatrix);//pixel shaderOUT.depth = viewSpacePos.z / cameraFarZ;  //cameraFarZ is the the z value of the far clip plane of the projection frustum



Quote:Original post by spek

As for the second part, at least my "3dPos = cameraPos + viewVec * depth" was correct :).
>> Is in.vertexPos.xy the position in object space or in world space?
Normally I would multiply the in.vertexPos with the ModelViewProjection indeed. But in this case, it isn't necesary. I render a quad with corner coordinates (-1,-1 .. +1,+1), since it needs to stay in front of the camera ('HUD quad'). The vertex shader just passes these coordinates. I don't know in which space they are then, "Projection Space"?. But...... maybe this won't work for calculating the viewVector.

And how to calculate the farplane corners? Sorry for these dumb questions, but all that matrix and space stuff is really difficult to me. Talking about it, what exactly is clip-space, screen-space and view-space? I understand world and object space, but these others are confusing. You get that when a point is transformed into the view frustum or something (where 0,0,0 would be the camera position?) ?

Thanks for helping MJP,
Rick


Let's start with the different coordinate spaces. I'll try to explain as best as I understand, please forgive me if it turns out my own understanding is inaccurate:

View-space (also known as eye-space) is a coordinate system based on the location and orientation of the camera. The camera position is always <0,0,0> in view-space, since its centered around the camera. This also means that if a certain point is 5 units directly in front of where the camera is facing, it will have a view-space position of <0,0,5>. Since view-space is just a translation and a rotation of your original world-space, view-space can be used for performing lighting calculations. Other spaces that utilize the perspective projection matrix can't be used for this, since perspective projection is a non-linear operation.

Clip-space is the result of transforming a view-space position by a perspective projection matrix. The result of this is not immediately usable, since the x y and z components must be divided by the w component to determine the point's screen position. This screen position is referred to as normalized device coordinates. In the vertex shader, the clip-space position is output since it can still be linearly interpolated in this form. Once you perform perspective division (divide by w), you can no longer interpolate which is needed for rasterization.

So getting back to your code...Your full-screen quad's vertex coordinates are going to be in some form of post-perspective format (I assume they're pre-transformed), and this means they're not directly usable calculating vectors since they're not in world-space or view-space.

As for your frustum coordinates, they're very easy to calculate. You should check out this article which explains how the frustum works better than I could, after that you should understand how to calculate any corner of the frustum. The corner I use in my implementation is referred to as "ftr" in that article.

EDIT: this is how I actually perform view-space reconstruction in my renderer:

In depth pass:
-multiply object-space position by worldView matrix to get view-space position
-divide view-space z by the farZ of the view frustum

In lighting pass:
-pass in coordinate of upper-right vertex of the view frustum's far clip plane
-viewDirection.xy = (projectedPos.xy/projectedPos.w) * frustumCoord.xy
-viewDirection.z = frustumCoord.z;
-view-space position is then viewDirection * pixelDepth
Quote:Original post by Matt Aufderheide
I am struggling with this too.. its seems i am close but not there yet... I wish soeoe could post a complete sahder instead of this littel fragments, as well as some pictures that maybe show the interpolants as renderer so we can test to see if each stage is working...

On of my main problems is that method of drawing a fullscreen quad uses pretransformed vertices, so I can us ea vertex sahder for that pass...otherwise I ahve troiuble mapping the pixels to textels perfectly(the screensized textures get a bit filtered otehrwise).

Does anyway have a good method of drawing a screen aligned quad that maps the texture perfectly to the screen pixel, and uses a vertex shader?


I'm away for the weekend with only my laptop, so unfortunately right now I can't post more than just code snippets reconstructed from by memory. When I get home on Sunday, I will post some more complete shader code and some pictures of off-screen surfaces.

As for directly mapping pixels to texels, are you using Direct3D9? If you are, this article is required reading.
Thanks for the very usefull information! Unfortunately my head is still messed up from the alcohol yesterday, so learning hurts now :). But I print that "space" information out.

I calculate the farplane top right (world) position and pass that as a parameter to the vertex shader. Judging from the numbers, I think its correct. But shouldn't I translate that world position to view-space as well? And you say that you multiply the vertex positions for the depth pass with the "worldView matrix". Do you mean the modelViewProjection matrix with that, or just the modelView? In OpenGL / Cg I can shoose between 4 matrices (texture, projection, modelView, modelViewProjection), in combination with identity, inverse, transpose, or inverse transpose. I suppose you mean the ModelViewProjection, since that one gives me a "good looking" result for the depth texture. So, now I have this:
out.vertexPos = mul( modelViewProj, in.vertexPos );...out.color.r = out.vertexPos.z / 500; // 500 is the maximum view distance


Now I'm still messing around with those quad coordinates. Normally, the coordinates aren't transformed at all. In OpenGL, I just pass the 4 "screen corner" coordinates like this:
glVertex2f(-1, -1);glVertex2f( 1, -1);glVertex2f( 1,  1);glVertex2f(-1,  1);

In the vertex shader I just copy those values, and that's it. No multiplications with matrices. I don't know in which space the coordinates are... But probably not the right coordinates to calculate the view direction. And/or the frustum TR coordinate is not in the right space as well?

// vertex shader	out.Pos.xy	= in.Pos.xy; // just copy (for a screen filling quad)		// MVP = modelViewProjection Matrix        // farTR is the Farplane top right position, in world space	float4 projPos = mul( MVP, iPos );	// ???			out.ViewDir.xy = (projPos.xy / projPos.w) * farTR.xy;	out.ViewDir.z  = farTR.z;// Fragment shader        in.viewDir    = normalize( in.viewDir );        float   depth = tex2D( depthMap, texcoords );        float3  pos3D = cameraPos.xyz + in.viewDir * depth;



Do you normalize the viewDirection? Sorry, but I ask everything in detail. I feel that I'm "close, but not cigar", and that could depend on little stupid errors.

Thanks for the help again!
Rick
Quote:Original post by spek
In the vertex shader I just copy those values, and that's it. No multiplications with matrices. I don't know in which space the coordinates are... But probably not the right coordinates to calculate the view direction.

The values you pass in with glVertex*() aren't inherently in any space. The space they're in is defined by the transformations you perform on them in your vertex shader. For instance, if you do this:
out.Pos = mul(worldViewPerspective, in.Pos);
It means the input values were in model space, otherwise you wouldn't have performed the world transformation on them. If you instead used the viewPerspective matrix, it means the input values were in world space, which is why you didn't need the world transformation but still needed the view transformation. Likewise, the perspective transform would mean all values were in view space, which is why no view space transformation was needed. Finally, if you just set the output equal to the input, it would mean all values were already in perspective space. Thus glVertex*() is just a means to input "position" values into your vertex shader. How you interpret those values is up to you.

That being said, this is probably the easiest way to do what you need. In the fragment shader:
float4 perspective_position = float4(in.Pos.x, in.Pos.y, tex2D(sceneDepthMap, texcoords.xy).r * in.Pos.w, in.Pos.w);float4 world_position_4d = mul(invViewPerspective, perspective_position);float3 world_position_3d = world_position_4d.xyz / world_position_4d.w;
Where in.Pos.xyzw is in perspective space (should already be the case in the fragment shader). This isn't an optimal solution since it requires a matrix multiplication per-fragment, but it is the "omg-I-can't-get-it-to-work-I'm-going-to-cry" solution that works based on the fundamental principles of these transformations. The vector-based solution proposed by MJP (and used by the article you referenced) gives better performance, yet the logic is a little trickier IMO and it requires some more precise coordination between your input values, vertex shaders and fragment shaders so that everything is in the correct space at the right time. You can pursue that approach if you're interested, but I just wanted to give you something to fall back on in the meantime.
OK, finally that works! .. Thank you.

The part i was missing was the the final:

float3 world_position_3d = world_position_4d.xyz / world_position_4d.w;

Sorry for the late reply, lots of things todo this weekend :) Now its time to relax and do some programming again.

Thanks again for the notes on matrix multiplications. I know the coordinates are just "values", but the matrice usage often confuses me. I should print this explenation along with MJP's text as well. One of the problems is probably that I use the wrong matrices. In this case, I don't know how to get the "inverse view perspective" matrix. I use Cg for the shaders, and in combination with OpenGL it offers to pass the following matrices:
- ModelView Matrix
- ModelViewProjection Matrix
- Texture Matrix
- Projection Matrix
In combination with identity/inverse/transpose/inverse transpose.

But I guess I have to construct this inverse view perspective matrix myself, just like you could modify the texture matrix for projective texturing, true? Or is it listed above, but with another name? I'm not very familiar with the technical "jargon". I tried the inverse modelView and modelViewProjection, but that didn't give the right results, probably that is something different...

Greetings,
Rick

This topic is closed to new replies.

Advertisement