Parallax occlusion mapping is a technique that reduces model geometry by encoding surface information in a texture. The surface information that is typically used is a height-map representation of
the replaced geometry. When the model is rendered, the surface details are reconstructed from the texture information in the pixel shader.
I recently read through the GDC06 presentation on parallax occlusion mapping titled Practical
Parallax Occlusion Mapping for Highly Detailed Surface Rendering by Natalya Tatarchuk of ATI Research Inc. In the presentation, an improved version of Parallax Occlusion Mapping is discussed
along with possible optimizations that can be used to accelerate the technique on current and next generation hardware.
Of course, after reading the presentation I had to try to implement the technique for myself to evaluate its performance and better understand its inner workings. This article attempts to present
an easy to understand guide to the theory behind the algorithm as well as my reference implementation of basic parallax occlusion mapping.
My investigation is focused on the surface reconstruction calculations and what parameters come into play when using this technique. I have decided not to discuss any particular lighting model
explicitly since, as you will see, the algorithm is very flexible and can easily be adapted to just about any lighting model that you would like. Natalya Tatarchuk has provided a sample in the
DirectX SDK called "ParallaxOcclusionMapping". This is an excellent resource that can be used to further explore a very full-featured lighting model if you are interested.
The reference implementation is written in DirectX 9 HLSL. The effect file that is developed in this exercise is provided with the article and may be used in whatever manner you desire.
So what exactly is parallax occlusion mapping? First let's look at an image of a polygonal surface that we would like to apply our technique to. Let's assume that this polygonal surface is square,
and has texture coordinates set to include an entire texture. Figure 1 shows this polygonal surface.
Figure 1: Flat polygonal surface
The basic idea behind parallax occlusion mapping is relatively simple. For each polygonal surface, we would like to simulate a complex volumetric shape. This shape will be represented by a
height-map encoded into a texture that is applied to the polygonal surface. Figure 2 shows this idea.
Figure 2: Flat polygonal surface approximating a volumetric shape
We will assume that the height-map values range in value from 0.0 – 1.0, with 1.0 being at the polygonal surface and 0.0 being at the deepest possible position of the volumetric shape. To be
able to correctly reconstruct the volumetric shape represented by the height map, the viewing direction must be used in conjunction with the height map data to calculate which parts of the surface
would be visible at each screen pixel of the polygonal surface, for the given viewing direction.
This is accomplished by using a simple ray-tracer in the pixel shader. The ray that we will be tracing is formed from the vector from the eye (or camera) location to the current pixel. Imagine
this vector piercing the polygonal surface, and travelling until it hits the bottom of the virtual volume. Figure 3 shows a side profile of these intersections.
Figure 3: View vector intersecting the virtual volume
The line segment from the polygonal surface to the bottom of the virtual volume represents the 'line of sight' for our surface. The task at hand is to figure out the first point on this segment
that intersects with our height-map. That point is what would be visible to the viewer if we were to render a full geometric model of our surface.
Since the point of intersection between our line segment and the height-map surface represents the visible point at that pixel, it also describes the corrected offset texture coordinates that
should be used to look up a color map, normal map, or whatever other textures you use to illuminate the surface. If this correction is carried out on all of the pixels that the polygonal surface is
rendered to, then the overall effect is to reconstruct the volumetric surface – which is what we originally set out to do.
Implementing Parallax Occlusion Mapping
Now that we have a better understanding of the algorithm of parallax occlusion mapping, it is time to put our newly acquired knowledge to use. First we will look at the required texture data and
how it is formatted. Then we will step through a sample implementation line by line with a thorough explanation of what is being accomplished with each section of code. The sample effect file is
written in HLSL, but the implementation should apply to other shading languages as well.
Before writing the parallax occlusion map effect file, let's look at the texture data that we will be using. The only required data is that we need a height-map of the volumetric surface that we
are trying to simulate. In this example, the height data will be stored in the alpha channel of the regular color texture map, with a value of 0 corresponding to the deepest point, and a value of 1
corresponding to the polygonal surface. Figure 4 shows the alpha channel height-map and the color texture that it will be coupled with.
Figure 4: Texture data used
With the texture data understood, we will now look into the vertex shader to see how we set up the parallax occlusion mapping pixel shader.
The first step in the vertex shader is to calculate the vector from the eye (or camera) position to the vertex. This is done by transforming the vertex position to world space, and then
subtracting its position from the eye position.
<span>float4</span> VertexPositionWS = <span>mul</span>( <span>float4</span>(IN.position, 1), mW );
<span>float3</span> P = VertexPositionWS.xyz;
<span>float3</span> E = EyePositionWS.xyz - P;
Next, we must transform the eye vector and the vertex normal to tangent space. The transformation matrix that we will use is based on the vertex normal, binormal, and tangent vectors.
tangentToWorldSpace = <span>mul</span>( IN.tangent, mW );
tangentToWorldSpace = <span>mul</span>( IN.binormal, mW );
tangentToWorldSpace = <span>mul</span>( IN.normal, mW );
Each of these vectors is transformed to world space, and are then used to form the basis of the rotation matrix for converting from tangent to world space. Since this is a rotation only matrix, if
we transpose the matrix it becomes its inverse. This produces the world to tangent space rotation matrix that we need.
<span>float3x3</span> worldToTangentSpace = <span>transpose</span>(tangentToWorldSpace);
Now the output vertex position and the output texture coordinates are trivially calculated.
OUT.position = <span>mul</span>( <span>float4</span>(IN.position, 1), mWVP );
OUT.texcoord = IN.texcoord;
And finally, we use the world to tangent space rotation matrix to transform the eye vector and vertex normal to tangent space.
OUT.eye = <span>mul</span>( E, worldToTangentSpace );
OUT.normal = <span>mul</span>( IN.normal, worldToTangentSpace );
That is all there is for the vertex shader. Now we move on to the pixel shader, which contains the actual parallax occlusion mapping code. The first calculation in the pixel shader is to determine
the maximum parallax offset length that can be allowed. This is calculated in the same way that standard parallax mapping does. The maximum parallax offset is a function of the depth of the surface,
as well as the orientation of the eye vector to the surface. For a further explanation see "Parallax Mapping with Offset Limiting: A Per-Pixel Approximation of Uneven Surfaces" by Terry Welsh.
<span>float</span> fParallaxLimit = <span>length</span>(IN.eye.xy) / IN.eye.z;
fParallaxLimit *= fHeightMapScale;
Next we calculate the direction of the offset vector. This is essentially a two dimensional vector that exists in the xy-plane. This must be the case, since the texture coordinates are on the
polygon surface with z = 0 (in tangent space) for the entire surface. The calculation is done by finding the normalized vector in the direction of offset, which is essentially the vector formed from
the x and y components of the eye vector. This direction is then scaled by the maximum parallax offset calculated in the last step.
<span>float2</span> vOffset = <span>normalize</span>( -IN.eye.xy );
vOffset = vOffset * fParallaxLimit;
Now we must determine how many height-map samples we are going to take while determining where the eye vector intersects it. This is done by using a dot product of the surface normal and the eye
vector as a measure of how 'straight on' the surface is to the viewing direction. First we find the normalized normal and eye vectors.
<span>float3</span> E = <span>normalize</span>( IN.eye );
<span>float3</span> N = <span>normalize</span>( IN.normal );
Then the number of samples is determined by lerping between a user specified minimum and maximum number of samples.
<span>int</span> nNumSamples = (<span>int</span>)<span>lerp</span>( nMinSamples, nMaxSamples, <span>dot</span>( E, N ) );
Since the total height of the simulated volume is 1.0, then starting from the top of the volume where the eye vector intersects the polygon surface the height is 1.0. As we take each additional
sample, the height of the vector at the point that we are sampling is reduced by the reciprocal of the number of samples. This effectively splits up the 0.0-1.0 height into n chunks where n is the
number of samples. This means that the larger the number of samples, the finer the height variation we can detect.
<span>float</span> fStepSize = 1.0 / (<span>float</span>)nNumSamples;
Since we would like to use dynamic branching in our sampling algorithm, we must not use any instructions that require gradient calculations within the dynamic loop section. This means that for our
texture sampling we must use tex2Dgrad instead of a plain tex2D instruction. To use tex2Dgrad, we must manually calculate the texture coordinate gradients in screen space outside of the dynamic loop.
This is done with the ddx and ddy instructions.
<span>float2</span> dx, dy;
dx = <span>ddx</span>( IN.texcoord );
dy = <span>ddy</span>( IN.texcoord );
Now we initialize the required variables for our dynamic loop. The purpose of the loop is to find the intersection of the eye vector with the height-map as efficiently as possible. So when we find
the intersection, we want to terminate the loop early and save the extra sampling efforts. We start with a comparison height of 1.0 (corresponding to the top of the volume), initial parallax offset
vectors of (0,0), and starting at the 0<sup>th</sup> sample.
<span>float2</span> vOffsetStep = fStepSize * vOffset;
<span>float2</span> vCurrOffset = <span>float2</span>( 0, 0 );
<span>float2</span> vLastOffset = <span>float2</span>( 0, 0 );
<span>float2</span> vFinalOffset = <span>float2</span>( 0, 0 );
<span>float</span> stepHeight = 1.0;
<span>int</span> nCurrSample = 0;
Next comes the dynamic loop itself. For each iteration of the loop, we sample the texture coordinates along our parallax offset vector. For each of these samples, we compare the alpha component
value to the current height of the eye vector. If the eye vector has a larger height value than the height-map, then we have not found the intersection yet. If the eye vector has a smaller height
value than the height-map, then we have found the intersection and it exists somewhere between the current sample and the previous sample.
<span>while</span> ( nCurrSample < nNumSamples )
vCurrSample = <span>tex2Dgrad</span>( Sampler, IN.texcoord + vCurrOffset, dx, dy );
if ( vCurrSample.a > stepHeight )
<span>float</span> Ua = (vLastSample.a - (stepHeight+fStepSize))
/ ( fStepSize + (vCurrSample.a - vLastSample.a));
vFinalOffset = vLastOffset + Ua * vOffsetStep;
vCurrSample = <span>tex2Dgrad</span>( Sampler, IN.texcoord + vFinalOffset, dx, dy );
nCurrSample = nNumSamples + 1;
stepHeight -= fStepSize;
vLastOffset = vCurrOffset;
vCurrOffset += vOffsetStep;
vLastSample = vCurrSample;
Once the intersection samples have been found, we solve for the linearly approximated intersection point between the last two samples. This is done by finding the intersection of the two line
segments formed between the last two samples and the last two eye vector heights. Then a final sample is taken at this interpolated final offset, which is considered the final intersection point.
Now all that is left is to illuminate the pixel based on these new offset texture coordinates. In our example here, we simply return the color that was sampled at this point. In the place of this
diffuse color return value, you could use the offset texture coordinates to sample a normal map, gloss map or whatever to implement your favorite lighting model.
OUT.color = vSampledColor;
Now that we have seen parallax occlusion mapping at work, lets consider some of the parameters that are important to the visual quality and the speed of the algorithm.
The algorithm as presented in the sample effect file requires approximately 11.98 seconds to generate a frame at a screen resolution of 640x480 with the reference rasterizer with the minimum and
maximum number of samples set to 8 and 50, respectively. Of course this will vary by machine, but it will serve as a good metric to base performance characteristics on since we know that the
algorithm is pixel shader bound.
The algorithm is implemented using shader model 3.0 constructs – specifically it uses dynamic branching in the pixel shader to reduce the number of unnecessary loops after the intersection
has already been found. Thus relatively modern hardware is needed to run this effect in hardware. Even with newer hardware, the algorithm is very pixel shader intensive. Each iteration of the dynamic
loop that does not find the intersection requires a texture lookup along with 19 ALU instructions, and the final intersecting iteration requires 2 texture lookups and 21 ALU instructions.
Considering that the sample images were generated with a minimum sample count of 8 and a maximum sample count of 60, you can see that the number of loops performed is on average going to be the
most performance intensive part of the loop. With this in mind, we should develop some methodology for determining how many samples are required for an acceptable image quality. Figure 5 compares
images generated with 60 and then 40 maximum samples respectively.
Figure 5: A 60-sample image (left) and a 40-sample image (right)
As you can see, there are aliasing artifacts along the top of the 40-sample image where the height map makes any sharp transitions (you may have to soon in on the image to see clearly). Even so,
the parts of the image that are not such high frequency still look acceptable. If you will be using low frequency textures, you may be able to significantly reduce your sampling rate without any
Another very important parameter that must be taken into consideration is the height-map scale, named fHeightMapScale in the sample effect file. If you imagine a 1-meter
by 1-meter square (in world space coordinates), then the height-map scale is how deep of a simulated volume we are trying to represent. For example, if the height-map scale is 0.04, then our 1x1
square would have a potential depth of 0.04 meters. Figure 6 shows two images with a scale height of 0.04 and 0.08 with the same sampling rates.
Figure 6: A 0.04 height-map scale image (left) and a 0.08 height map scale image (right)
It is easy to see the dramatic amount of occlusion caused by the increased scale height. Also notice toward the top of the image that the aliasing artifacts are back – even though the
sampling rates are the same. With this in mind, you can see that the height scale also determines how 'sharp' the features are with respect to the eye vector. The taller the features are, the harder
they will be to detect intersections with the eye vector. This means that we would need more samples per pixel to obtain similar image quality if the height scale is larger. So a smaller height scale
is "a good thing".
In addition, we also have to consider how the algorithm will react when viewing polygonal surfaces nearly edge on. Our current algorithm uses a maximum of 60 samples to determine where the
intersections are. This is already a prohibitive number of instructions to run, but the image quality is going to be horrible as well. Here's why. If your height-map is 256x256, and you view our 1m x
1m square from the edge on, then in the worst case you can have a single screen pixel be required to test 256 texels for intersections before it finds an intersection. We would need 8 times more
samples than our maximum sampling rate to get an accurate intersection point! Figure 7 shows an edge on image generated with 50 samples and 0.04 height-map scale.
Figure 7: A 60-sample 0.04 height-map scale image from an oblique angle
Mip-mapping would help, but each level of mip-map will reduce the resolution of the height-map and introduce swimming artifacts. Care must be taken to reduce the situations where an object would
be viewed edge on, or to switch to a constant time algorithm like bump mapping at sharp angles.
The ideal sampling situation would be to have one sample for each texel that the eye vector could possibly pass through during the intersection test. So a straight on view would only require a
single sample, and an edge on view would require as many samples as there are texels in line with the pixel (up to a maximum of the number of texels per edge).
This is actually information that is already available to us in the pixel shader. Our maximum parallax offset vector length, named fParallaxLimit in the pixel shader, is
a measure of the possible intersection test travel in texture units (the xy-plane in tangent space). It is shorter for straight on views and longer for edge on views, which is what we want to base
our number of samples on anyways. For example, if the parallax limit is 0.5 then a 256x256 height-map should sample, at most, 128 texels. This idea has been implemented and commented out in to the
pixel shader in the supplied effect file for you to experiment with. This sampling method will provide the best quality results, but will run slower due to the larger number of iterations.
Whatever sampling algorithm is used, it should be chosen to provide the minimum number of samples that provides acceptable image quality.
Another consideration should be given to how large an object is going to appear on screen. If you are using parallax occlusion mapping on an object that takes up 80% of the frame buffer's pixels,
then it will be much more prohibitive than an object that is going to take 20% of the screen. So even if your target hardware can't handle full screen parallax occlusion mapping, you could still use
it for smaller objects.
I decided to write this article to provide some insight into the parallax occlusion mapping algorithm. Hopefully it is easy to understand and will provide some help in implementing the basic
algorithm in addition to giving some hints about the performance vs. quality tradeoff that must be made. I think that the next advance in this algorithm is probably going to be making it more
efficient, most likely with either a better sampling rate metric, or with a data structure built into the texture data to accelerate the searching process. If you feel that this document could be
improved or have questions or comments on it please feel free to contact me as 'Jason Z' on the GameDev.net forums or you could also PM me.
About the Author
Jason Zink is an electrical engineer currently working in the automotive industry. He is currently working towards a M.S. in Computer Science, and has a received a B.S. in Electrical Engineering.
He has been writing software for about 8 years in various fields such as industrial controls, embedded computing, business applications, automotive communication systems, and most recently game
development with a great interest in graphics programming.