Dual Paraboloid Mapping questions

1,244

Author

July 02, 2008 03:35 PM

Hi, Now that SH lighting is finally working, it's time for some optimizations. One of the things I had in mind, is to use Dual Paraboloid Maps instead of cubemaps. I want to use them for reflections in the world, and downscale them to tiny probes for realtime ambient lighting with spherical harmonics. DPM, The good - Only 2 passes instead of 6 DPM, The Bad - Less accurate, and/or high tesselation of world mesh required - Sampling a pixel with a given normal is more work (I think) Since the reflections and ambient lighting don't have to be 100% correct, the less accurate results don't have to be a huge problem. But... when I tried making my first DPM test program (with the help of Jason Zinks article on gamedev), I saw relative much code was required to obtain a pixel from the DPM, including a matrix multiplication:


	float3 E = normalize( cameraVec);		// normalize input per-vertex eye vector
	float3 R = reflect( E, normal );		// calculate the reflection vector
	
	R = mul( (float3x3)ParaboloidBasis, R );	// transform reflection vector to the maps basis
	
	// calculate the forward paraboloid map texture coordinates	
	float2 front;
	front.x = (R.x / (2*(1 + R.z))) + 0.5;
	front.y = 1-((R.y / (2*(1 + R.z))) + 0.5);
	
	// calculate the backward paraboloid map texture coordinates
	float2 back;
	back.x = (R.x / (2*(1 - R.z))) + 0.5;
	back.y = 1-((R.y / (2*(1 - R.z))) + 0.5);

	float3 forward  = tex2D( frontMap, front );	// sample the front paraboloid map
	float3 backward = tex2D( backMap , back );	// sample the back paraboloid map

	oColor.rgb = max(forward, backward);		// output the max of the two maps

This is not a problem for 1 sample, but I'll have to take 512 samples in the shader for up to 10 pixels per frame. The upper part of the code only has to be done once, but how about the matrix multiply and the code below? I have multiple DPM probes on different locations. Can they all do with the same matrix? If not, I'll have pass 10 or maybe 20 different matrices to the shader somehow! (10 or 20 cubeMaps or DPM's are stored in 1 3D texture). Speaking of that matrix, I haven't managed to find the right matrix yet. I thought the modelView matrix should be used (the one used when generating the DPM), but that didn't work. BTW, I'm using Cg shaders & OpenGL. Another question. The ATI ambient demo showed how to speed up by rendering multiple cubeMaps at the same time with the help of geometry shaders. I assume I can do the same trick for DPM's? I haven't used geometry shaders so far... Greetings, Rick

Tower22 Blog

Jason Z

6,437

July 02, 2008 09:43 PM

Hello, maybe I can help out here. The matrix needs to be the view matrix that is used in the generation of the paraboloid maps. This shouldn't include any object transformations - the matrix multiplication is changing the reflection vector from world space to view space.

With that in mind, the calculation of the reflection vector and the subsequent transformation could be moved to the vertex shader, but then you will be relying on the interpolation between vertices to produce a good reflection vector. This is not really a good solution because it is even more dependant on the tessellation of the scene!

Try using the effect in its current state and see what the performance is like. I'm not entirely clear about what you are doing with it, but you may be able to squeeze out some performance elsewhere to make it fast enough. If you have any more trouble post about it here - I'm sure we can work out any kinks!

Jason Zink :: DirectX MVP

Direct3D 11 engine on CodePlex: Hieroglyph 3

Direct3D Books: Practical Rendering and Computation with Direct3D 11, Programming Vertex, Geometry, and Pixel Shaders
Articles: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article (original):: Fast Silhouettes Article

Games: Lunar Rift

osmanb

2,082

July 02, 2008 11:35 PM

One thing that might not be obvious: You can just assume all of your lights are oriented in an axis-aligned fashion. (eg, they're all just facing down Z, or whatever). If you do that, you don't need the matrix multiply at all, you can replace it with a simple translation by the position of the light (or probe in this case). If you have a really good reason to orient the paraboloid splitting plane in some other way, then this doesn't work. But in most cases, the decision for how to orient the two halves is fairly arbitrary anyway. This is what we do for our dual paraboloid shadow maps.

spek

1,244

Author

July 03, 2008 02:05 AM

Hey, Jason Zink! I've seen your article alot last weeks. Strange indeed that there aren't much other articles/demo's about this stuff, besides old OpenGL demo's without shaders (or assembler shaders).

To clearify things, I'm using my reflection maps (either cubeMaps or DPM's) for reflections on the world geometry & objects. Just, by simply rendering the surrounding environment at a certain point. I was planning to experiment with a grid of 9 or 20 probes (depending on system speed) in front of the camera. The camera shoots 9 or 20 rays into the directions in front of the camera. Where the rays collide, a reflection probe will be placed. Another probe is placed inside the camera itself (for the reflection on nearby objects / player itself). The probes don't have any rotation. 2 maps are just rendered into +z and -z direction. (So maybe I can simplify this, like Osmanb said).

In the end, depending on the pixel screen position, it will blend its reflection between 2 or 3, maybe even 4 probes. For example, a pixel topleft will pick the probe that was generated with the ray shooting to the topleft corner of the view frustum. I don't know if it works, but the plan is to make this grid refresh every frame == realtime & dynamic reflections.

Because I need to render 9 or even 20 + 1 probes, Dual Paraboloid Maps can save some serieus work. Cubemap = 21 * 6 = 126 passes, while DPM = 21 * 2 = "only" 42 passes. The corner probes might not be refreshed every frame, but you can see the point of using DPM's here.

However, the first problems start here. In the final reflection shader, I need access to all 9 or 20 cubeMaps/DPM's. Therefore I place each face in 1 big 3D texture. That saves me switching and sorting on textures, plus I can blend between all maps. But if the matrix is different for each DPM, I also need access to 9 or 20 matrices. Easily accessing them is a problem, unless I can do something with an array. And is uploading 20 matrix parameters not a problem?

I wondered though, like Osmanb said, is that matrix transformation nescesary? My probes ALWAYS only point into +z and -z direction. Maybe I can replace the matrix operation with something more simple... I don't need any matrices for cubeMaps neither, the pixel world normal is enough.

The second part is SH lighting. The same probes used for the reflections are downscaled to tiny maps (8x8 for cubeMaps, 16x16 if they are DPM's). I'll have to downscale less maps, which is good. But the SH shaders have to sample ALOT of pixels from these maps. To illustrate, the shader that produces SH coefficients has to loop through all reflection map pixels. That means 8x8x6 = 386 texCUBE operations for a cubeMap, or 16x16x2=512 DPM pixel-fetch operations. In the last case I need to sample from 2 textures instead of 1, and test somehow if the pixel is not outside the sphere (only 70% from the DPM texture is actually used). I have 9/20+1 probes, that also means I need to update 9/20+1 SH coefficients as fast as possible with that shader.

I'm a little bit afraid that all the speed I win with saving 4 passes per probe, is not going to compensate the extra work required in the heavy SH shaders. I can simplify the original DPM pixel shader code... And if I only win a little performance, then I might choose for cubeMaps after all, since they don't require the world to be highly tesselatedUnless

[edit]
Forget about the SH part. I'll have to loop through ALL pixels, so I don't need any code to convert a specific normal to texture coordinates, neither I'll have to read 2 textures and choose between 1 of them. The only additional cost I'll get is to determine if the pixel is in- or outside the sphere. I can do that with the alpha channel of the dual paraboloid textures.

Nevertheless, I still wonder if there are faster ways to access the DPM's, and if I can skip the matrix operation or at least use the same matrices for all DPM's.

Thanks guys,
Rick

[Edited by - spek on July 3, 2008 7:05:06 AM]

Tower22 Blog

osmanb

2,082

July 03, 2008 08:52 AM

This all sounds pretty cool. We have a fairly solid DPM implementation for our shadows here, and we've talked about using it to do other things (if we ever get time). It's not directly applicable, but you might find something useful in:

http://osman.brian.googlepages.com/dpsm.pdf

You can definitely skip the matrix multiply, assuming that you're already in world space. And you might be able to avoid most of the tesselation stuff, depending on what exactly you're doing (and how your scene). Reflections are usually pretty distorted anyway, so when you're rendering the DPMs, it probably doesn't matter too much if they're somewhat distorted in the maps. When you're rendering the scene (which is equivalent to the lighting pass in the above paper), you can do the DP transformation in the pixel shader, rather than the vertex shader. That completely eliminates any further distortion, and I suspect the results will be good enough (in most cases).

spek

1,244

Author

July 03, 2008 10:37 AM

That was the type of answer I was hoping for. Yep, the reflections don't have to be superb, I render the maps on a low resolution anyway 128x128. I downscale it to 64x64, 32x32 and 16x16 (for the ambient lighting). the passes in between are blurred as well. Most of the stuff in my world only vagely reflects, therefore a low-res blurred map is fine.

Everything is in world-space indeed. When I render a probe, I basically do this:
1.- Apply field of View = 90 (is this nescesary?), screen ratio 1.0,
near = 0.01, far = x
2.- put camera on probe world position
3.- point towards +z, or -z
4.- Render the world with dual paraboloid transformation

I've checked both maps. They seem to be ok, well, almost. Some of the polygons that are not completely inside the hemisphere, seem to be missing...

Before I try all the cool stuff from my previous post, I first try to do reflections on a simply cube. So far, they are not correct yet. I can see the maps on it, but artifacts (lines) are on the cube as well, and the reflections are for the wrong directions. I guess this has to do with the matrix. When I use the modelview matrix, I get very wrong results. The cube just has 1 color (from 1 pixel in the DPM I suppose). If I don't do the matrix operation at all, I get these artifacts, and reflections on the wrong sides. It's also as if the camera was zoomed in alot (texture appears very big on the cube).

I'm using the code I posted earlier, but without the "mul( matrix, R )" part.

Greetings and thanks for helping,
Rick

Tower22 Blog

Jason Z

6,437

July 03, 2008 09:46 PM

I think Osmanb is correct - you don't need to do a full matrix multiply if the vector is already in world space AND you enforce the rule that the paraboloids are oriented along the +/- z axes. However, you will need to ensure that the vector is generated from two world space points. (EDIT: I would really like to see your shader code before making this judgement - something isn't adding up here...)

From looking at the code that you posted, it seems like some of the vector creation is mixed and matched between the original generation and accessing shaders that I sent out with the article. Please be careful to create the vectors in the correct manner! This is a definite source of problems if they aren't created correctly. Perhaps it would help if you posted the shader code for generating the maps and then the shader code for accessing the maps - it could be a simple mistake making the whole technique seem ridiculous!!

The lighting technique that you are mentioning sounds quite complex, and would bring my GPU to its knees! Even so, that is usually how the next generation techniques get started - on hardware that doesn't do it justice... Given the fact that you are using low resolution information in the lighting 'probes', I think DPM is clearly the way to go. There isn't much advantage in using cube maps if quality isn't the top priority.

Also, I don't know if OpenGL has an equivalent or not, but I have been using D3D10 texture arrays to generate my DPM maps. It allows you to render to both targets at the same time from a single draw call (by setting the desired render target in the geometry shader for each instance created in the geometry shader). I have actually written full blown book chapter on different environment mapping techniques that describes how to do this. Hopefully it will be available soon to the community for just these types of situations!

Jason Zink :: DirectX MVP

Direct3D 11 engine on CodePlex: Hieroglyph 3

Direct3D Books: Practical Rendering and Computation with Direct3D 11, Programming Vertex, Geometry, and Pixel Shaders
Articles: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article (original):: Fast Silhouettes Article

Games: Lunar Rift

spek

1,244

Author

July 04, 2008 02:11 AM

I'm not sure, but I think the shader code is from your HLSL example. I'm using Cg though, so I had to make a few changes. For example, switching the parameters for a "mul" operation ( mul( vector, modelView ) -> mul( modelView, vector )". Anyway, I'll post the full shaders at the end of this post. Ow, and could someone please tell me the tag again to put code in a highlighted box? I forgot it.

The ambient lighting is heavy stuff indeed, although it still runs reasonable on my GeForce 8800. The main problem is not to update too much probes per frame. If you only update 2 probes, no problemo. But the problem is that worlds ussually need an awful lot of probes, which results in big (uniform 3D) grids of probes. Fine for a small scene, but way too much for a bigger (outdoor) scene. It would take maybe 10 seconds before all probes are updated, and it takes alot of memory. A waste, since most of the probes aren't even used. Therefore I thought about shooting a fixed (small) number of probes out of the camera. Updating 8 probes can be done on 50 frames per second in my card. However, all other techniques are still disabled (SSAO, DoF, HDR, Bloom, reflections, complex scenes, ... ).

Thanks for reminding me the multiple target options. I can indeed render to two textures at the same time with OpenGL as well. I'm using an indicator ("zValue") and the Cg "skip" function to discard pixels on the wrong side. But if I understand it right, the entire surrounding scene (360 degrees) is rendered in front of the camera, and I skip ~50% of it. So if I want render both of them in 1 pass:

// fragment code// Original    out.Color.a = 0;   // Invisible pixel so far    skip( zValue ); // Return if zValue is 0(or 1, I'm not familiar yet with skip)    > proceed normally    out.Color.a = 1;// NEW CODE     > do normal code   out.colorFront = out.colorBack = myResult;   if (zValue == 1)   {      out.colorFront.a = 0;  // Hide pixel in front buffer      out.colorBack.a  = 1;  // Shiw pixel in back buffer   } else   {      out.colorFront.a = 1;  // Show pixel in front buffer      out.colorBack.a  = 0;  // Hide pixel in back buffer   }

The vertex shader uses a "direction" parameter to indicate if the pixel is meant for the front or back buffer. What to do with that piece of code?

Ok, here comes the code. Currently I don't do any matrix operation in the fragment shader, so I commented it out. If I enable it, I still get wrong results though.
//------------------- DPM generation -------------------------------------

< Vertex Shader >void main(		float4	iPos				: POSITION,	// World position		out	float4	oPos				: POSITION,	// Result position	out	float	oZValue				: TEXCOORD7,		uniform	float4x4	MV			= MV,		// ModelView Matrix	uniform	half3		cameraPos		= V3_CAM_POS,	uniform half		direction		= F_CAM_DIRECTION	// +1 or -1){	// Bend vertex into the paraboloid	oPos 	= mul( MV, float4(iPos.xyz,1) );// transform vertex into the maps basis	oPos 	= oPos / oPos.w;		// divide by w to normalize	oPos.z 	= oPos.z * direction;		// set z-values to forward or backward		float L = length( oPos.xyz );		// determine the distance between (0,0,0) and the vertex	oPos 	= oPos / L;			// divide the vertex position by the distance 		oZValue	= oPos.z;			// remember which hemisphere the vertex is in	oPos.z 	= oPos.z + 1;			// add the reflected vector to find the normal vector	oPos.x 	= oPos.x / oPos.z;		// divide x coord by the new z-value	oPos.y 	= oPos.y / oPos.z;		// divide y coord by the new z-valueconst half MAX_VIEW_DIST	= 500;	oPos.z 	= L / MAX_VIEW_DIST;		// set a depth value for correct z-buffering	oPos.w 	= 1;				// set w to 1 so there is no w divide		//---------------------------------------------------------------------	... Additional code for the lighting in the fragment shader	... shadowMap matrices, passing texcoords / normals, etc.} // VP_Probe_Paraboloid< Fragment Shader >void main(	// VERTEX SHADER INPUT		...		float	iZValue				: TEXCOORD7,			out	half4	oColor				: COLOR0,	// Result	){	oColor.a = 0;	clip( iZValue );	// Return if iZValue = 0?				// I'm not familiar with this Cg function		... Calculate lighting	// RESULT	oColor.rgb	= (albedo * directDiffuse + emissive) * fog;	oColor.a = 1;	} // FP_Probe_Paraboloid

//------------------- DPM Usage ---------------------------------------------

< Vertex Shader >void main(		float4	iPos				: POSITION,	// world position		float3	iNormal				: NORMAL,	// world normal			out	float4	oPos				: POSITION,			out	float3	oNormal				: TEXCOORD2,	out	float3	oCamVec				: TEXCOORD3,		uniform	float4x4	MVP			= MVP, // ModelViewProjection	uniform	half3		cameraPos		= V3_CAM_POS){	oNormal.xyz	= iNormal.xyz;	// Pass world normal	oCamVec.xyz	= iPos.xyz - cameraPos;	oPos		= mul( MVP, iPos );} // VP_DPM_Test< Fragment Shader >void main(	out	half4	oColor				: COLOR0,				float3	iNormal		: TEXCOORD2,		float3	iCamVec		: TEXCOORD3,		uniform	sampler2D frontMap	: TEXUNIT0,	uniform	sampler2D backMap	: TEXUNIT1,		uniform float3x3 ParaboloidBasis	= TEX_MATRIX1	// modelView used when generating the probe. 								// Stored in texture matrix){	float3 E = normalize( iCamVec );		// normalize input per-vertex eye vector	float3 R = reflect( E, iNormal );		// calculate the reflection vector	//!	R = mul( (float3x3)ParaboloidBasis, R );	// transform reflection vector to the maps basis		// calculate the forward paraboloid map texture coordinates		float2 front;	front.x = (R.x / (2*(1 + R.z))) + 0.5;	front.y = 1-((R.y / (2*(1 + R.z))) + 0.5);		// calculate the backward paraboloid map texture coordinates	float2 back;	back.x = (R.x / (2*(1 - R.z))) + 0.5;	back.y = 1-((R.y / (2*(1 - R.z))) + 0.5);	float3 forward  = tex2D( frontMap, front );	// sample the front paraboloid map	float3 backward = tex2D( backMap , back );	// sample the back paraboloid map	oColor.rgb = max(forward, backward);		// output the max of the two maps	oColor.a   = 1;} // FP_DPM_Test

Thanks, and good luck with your book!
Rick

Tower22 Blog

Jason Z

6,437

July 04, 2008 10:17 AM

I think I see two different things in the shader code that could be changed. The first is not a source of error, but could be removed nonetheless. The divide by w after transforming to the paraboloid view space in the map generation vertex shader isn't needed. I had included it in the original article, but it isn't necesary since the matrix multiply doesn't include any projection (i.e. the w values will always be 1 without the divide anyways). So you can remove an unneeded divide from the vertex shader [grin].

The second point, and probably at least one source of error, is that the normal vector in the map accessing vertex shader is not transformed to world space. You are essentially using the object space normal vector of the geometry that is using the DPM instead of the world space vector. Here's the same operation from the original article shader code:

float3 N = normalize(mul(IN.normal, (float3x3)ModelWorld)); // find world space normal

This should at least be one step in the right direction. Try using the matrix multiply as it was originally, and then once it is working you can try to simplify and remove the instruction.

Also, I think the multiple render targets that you mention above is a little different than the texture arrays that I was talking about before (which use the geometry shader instead of the fragment shader). Even so, I think MRT should be useable as well but wouldn't really gain much performance. I would be very interested to hear if you try it out and have some success though!

Hopefully this helps get you up and running!

[Edited by - Jason Z on July 4, 2008 11:17:48 AM]

Jason Zink :: DirectX MVP

Direct3D 11 engine on CodePlex: Hieroglyph 3

Direct3D Books: Practical Rendering and Computation with Direct3D 11, Programming Vertex, Geometry, and Pixel Shaders
Articles: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article (original):: Fast Silhouettes Article

Games: Lunar Rift

spek

1,244

Author

July 05, 2008 07:09 AM

Removing the division with position.w didn't hurt, removed that part. I don't understand the world-normal part though. All the vertex positions and normal are already in world space. Nothing is transformed, rotated or scaled (so far I'm only rendering a static world).

I tried multiplying the normal, but that didn't really work. But probably because I choose the wrong matrix. I'm not 100% sure, but "ModelView" matrix is not the same as "ModelWorld" matrix. This gives me the idea I'm using the wrong matrix for the "R = mul( dualParaboloidBasis, normal )" line as well. ModelWorld is not defined in OpenGL if I'm right, so I should make it myself. Unfortunately, I'm a disaster with matrices, so I need a helping hand on that.

But in the end I'm hoping to skip that part anyway. But yet again, I don't know how to correctly replace that part. Just commenting out the "R = mul..." line is not working. Here are some screenshots from what I have so far:
http://img364.imageshack.us/my.php?image=dpm1jh2.jpg
http://img354.imageshack.us/my.php?image=dpm2rq0.jpg

What you see on in the front/back maps is correct, although there a couple of problems. It's not exactly a nice circle. It seems that border polygons are falling away. You see the greenish polygons? These are polygons behind the ceiling/walls/floor. Normally they are culled away (therefore they have the wrong green color), but I disabled face culling. If I enable face culling, I only see the inverted world on the front map.

The cube with the reflections is not exactly correct either. That's because of skipping the matrix I suppose. If I enable the matrix operation, eventually with transforming the normal, the entire cube has only 1 color.
[edit]
I tried some simple code:

	float3 E = normalize( iCamVec );		// normalize input per-vertex eye vector	float3 R = reflect( E, iNormal );		// calculate the reflection vector	if (R.z > 0)	{		float2 tx;		tx.x = 1-((R.x+1)*0.5); // maybe texture is swapped		tx.y =    (R.y+1)*0.5;		oColor.rgb = tex2D( frontMap, tx ); 	} else	{		float2 tx;		tx.x = 1-((R.x+1)*0.5);		tx.y =    (R.y+1)*0.5;			oColor.rgb = tex2D( backMap , tx );	}

This seems to work, without any matrices. I still see some wrong parts in the reflection though. That is because the circle is not filled entirely (see the 2 shots I posted). I don't know how to fix that. Maybe my scene is not tesselated enough? I estemate the big walls are subdivided in 1.5 x 1.5 metre parts.

Greetings,
Rick

[Edited by - spek on July 5, 2008 11:09:43 AM]

Tower22 Blog

Dual Paraboloid Mapping questions

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Dual Paraboloid Mapping questions

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines