## Introduction

Parallax occlusion mapping is a technique that reduces a geometric model's complexity by encoding surface detail information in a texture. The surface information that is typically used is a height-map representation of the replaced geometry. When the model is rendered, the surface details are reconstructed in the pixel shader from the height-map texture information. I recently read through the GDC06 presentation on parallax occlusion mapping titled "Practical Parallax Occlusion Mapping for Highly Detailed Surface Rendering" by Natalya Tatarchuk of ATI Research Inc. In the presentation, an improved version of Parallax Occlusion Mapping is discussed along with possible optimizations that can be used to accelerate the technique on current and next generation hardware. Of course, after reading the presentation I had to implement the technique for myself to evaluate its performance and better understand its inner workings. This chapter attempts to present an easy to understand guide to the theory behind the algorithm as well as to provide a reference implementation of basic parallax occlusion mapping algorithm. This investigation is focused on the surface reconstruction calculations and what parameters come into play when using this technique. I have decided to implement a simple Phong lighting model. However, as you will see shortly this algorithm is very flexible and can easily be adapted to just about any lighting model that you would like to work with. In addition, a brief discussion of how to light a parallax occlusion mapped surface is also provided. The reference implementation is written in Direct3D 10 HLSL. A demonstration program is also available on the book's website that shows the algorithm in action. The demo program and the associated effect files that have been developed for this chapter are provided with it and may be used in whatever manner you desire.This article was originally published to GameDev.net in 2006. It was revised by the original author in 2008 and published in the book Advanced Game Programming: A GameDev.net Collection, which is one of 4 books collecting both popular GameDev.net articles and new original content in print format.

## Algorithm Overview

So what exactly is parallax occlusion mapping? First let's look at an image of a standard polygonal surface that we would like to apply our technique to. Let's assume that this polygonal surface is a cube, consisting of six faces with two triangles each for a total of twelve triangles. We will set the texture coordinates of each vertex such that each face of the cube will include an entire copy of the given texture. Figure 1 shows this simple polygonal surface, with normal mapping used to provide simple diffuse lighting.## Implementing Parallax Occlusion Mapping

Now that we have a better understanding of the parallax occlusion mapping algorithm, it is time to put our newly acquired knowledge to use. First we will look at the required input texture data and how it is formatted. Then we will step through a sample implementation line by line with a thorough explanation of what is being accomplished with each section of code. The sample effect file is written in Direct3D 10 HLSL, but the implementation should apply to other shading languages as well. Before writing the parallax occlusion map effect file, let's examine the texture data that we will be using. The standard diffuse color map is provided in the RGB channels of a texture. The only additional data that is required is a height-map of the volumetric surface that we are trying to simulate. In this example, the height data will be stored in the alpha channel of a normal map where a value of 0 (shown in black) corresponds to the deepest point, and a value of 1 (shown in white) corresponds to the original polygonal surface. Figure 5 shows the color texture, alpha channel height-map, and the normal map that it will be coupled with.```
float3 P = mul( float4( IN.position, 1 ), mW ).xyz;
float3 N = IN.normal;
float3 E = P - EyePosition.xyz;
float3 L = LightPosition.xyz - P;
```

Next, we must transform the eye vector, light direction vector, and the vertex normal to tangent space. The transformation matrix that we will use is based on the vertex normal, binormal, and tangent vectors.
```
float3x3 tangentToWorldSpace;
tangentToWorldSpace[0] = mul( normalize( IN.tangent ), mW );
tangentToWorldSpace[1] = mul( normalize( IN.binormal ), mW );
tangentToWorldSpace[2] = mul( normalize( IN.normal ), mW );
```

Each of these vectors is transformed to world space, and are then used to form the basis of the rotation matrix for converting a vector from *tangent*to

*world*space. Since this is a rotation only matrix, then if we transpose the matrix it becomes its own inverse. This produces the

*world*to

*tangent*space rotation matrix that we need.

```
float3x3 worldToTangentSpace = transpose(tangentToWorldSpace);
```

Now the output vertex position and the output texture coordinates are trivially calculated.
```
OUT.position = mul( float4(IN.position, 1), mWVP );
OUT.texcoord = IN.texcoord;
```

And finally, we use the world to tangent space rotation matrix to transform the eye vector, light direction vector, and the vertex normal to tangent space.
```
OUT.eye = mul( E, worldToTangentSpace );
OUT.normal = mul( N, worldToTangentSpace );
OUT.light = mul( L, worldToTangentSpace );
```

That is all there is for the vertex shader. Now we move on to the pixel shader, which contains the actual parallax occlusion mapping code. The first calculation in the pixel shader is to determine the maximum parallax offset length that can be allowed. This is calculated in the same way that standard parallax mapping does it. The maximum parallax offset is a function of the depth of the surface (specified here as *fHeightMapScale*), as well as the orientation of the eye vector to the surface. For a further explanation see "Parallax Mapping with Offset Limiting: A Per-Pixel Approximation of Uneven Surfaces" by Terry Welsh.

```
float fParallaxLimit = -length( IN.eye.xy ) / IN.eye.z;
fParallaxLimit *= fHeightMapScale;
```

Next we calculate the direction of the offset vector. This is essentially a two dimensional vector that exists in the xy-plane of the tangent space. This must be the case, since the texture coordinates are on the polygon surface with z = 0 (in tangent space) for the entire surface. The calculation is performed by finding the normalized vector in the direction of offset, which is essentially the vector formed from the x and y components of the eye vector. This direction is then scaled by the maximum parallax offset calculated in the previous step.
```
float2 vOffsetDir = normalize( IN.eye.xy );
float2 vMaxOffset = vOffsetDir * fParallaxLimit;
```

Then the number of samples is determined by lerping between a user specified minimum and maximum number of samples.
```
int nNumSamples = (int)lerp( nMaxSamples, nMinSamples, dot( E, N ) );
```

Since the total height of the simulated volume is 1.0, then starting from the top of the volume where the eye vector intersects the polygon surface provides an initial height 1.0. As we take each additional sample, the height of the vector at the point that we are sampling is reduced by the reciprocal of the number of samples. This effectively splits up the 0.0-1.0 height into n chunks where n is the number of samples. This means that the larger the number of samples, the finer the height variation we can detect in the height map.
```
float fStepSize = 1.0 / (float)nNumSamples;
```

Since we would like to use dynamic branching in our sampling algorithm, we must not use any instructions that require gradient calculations within the dynamic loop section. This means that for our texture sampling we must use *SampleGrad*instruction instead of a plain

*Sample*instruction. In order to use

*SampleGrad*, we must manually calculate the texture coordinate gradients in screen space outside of the dynamic loop. This is done with the intrinsic ddx and ddy instructions.

```
float2 dx = ddx( IN.texcoord );
float2 dy = ddy( IN.texcoord );
```

Now we initialize the required variables for our dynamic loop. The purpose of the loop is to find the intersection of the eye vector with the height-map as efficiently as possible. So when we find the intersection, we want to terminate the loop early and save any unnecessary texture sampling efforts. We start with a comparison height of 1.0 (corresponding to the top of the virtual volume), initial parallax offset vectors of (0,0), and starting at the 0th sample.
```
float fCurrRayHeight = 1.0;
float2 vCurrOffset = float2( 0, 0 );
float2 vLastOffset = float2( 0, 0 );
float fLastSampledHeight = 1;
float fCurrSampledHeight = 1;
int nCurrSample = 0;
```

Next is the dynamic loop itself. For each iteration of the loop, we sample the texture coordinates along our parallax offset vector. For each of these samples, we compare the alpha component value to the current height of the eye vector. If the eye vector has a larger height value than the height-map, then we have not found the intersection yet. If the eye vector has a smaller height value than the height-map, then we have found the intersection and it exists somewhere between the current sample and the previous sample.
```
while ( nCurrSample < nNumSamples )
{
fCurrSampledHeight = NormalHeightMap.SampleGrad( LinearSampler, IN.texcoord + vCurrOffset, dx, dy ).a;
if ( fCurrSampledHeight > fCurrRayHeight )
{
float delta1 = fCurrSampledHeight - fCurrRayHeight;
float delta2 = ( fCurrRayHeight + fStepSize ) - fLastSampledHeight;
float ratio = delta1/(delta1+delta2);
vCurrOffset = (ratio) * vLastOffset + (1.0-ratio) * vCurrOffset;
nCurrSample = nNumSamples + 1;
}
else
{
nCurrSample++;
fCurrRayHeight -= fStepSize;
vLastOffset = vCurrOffset;
vCurrOffset += fStepSize * vMaxOffset;
fLastSampledHeight = fCurrSampledHeight;
}
}
```

Once the pre- and post-intersection samples have been found, we solve for the linearly approximated intersection point between the last two samples. This is done by finding the intersection of the two line segments formed between the last two samples and the last two eye vector heights. Then a final sample is taken at this interpolated final offset, which is considered the final intersection point.
```
float2 vFinalCoords = IN.texcoord + vCurrOffset;
float4 vFinalNormal = NormalHeightMap.Sample( LinearSampler, vFinalCoords ); //.a;
float4 vFinalColor = ColorMap.Sample( LinearSampler, vFinalCoords );
// Expand the final normal vector from [0,1] to [-1,1] range.
vFinalNormal = vFinalNormal * 2.0f - 1.0f;
```

Now all that is left is to illuminate the pixel based on these new offset texture coordinates. In our example here, we utilize the normal map normal vector to calculate a diffuse and ambient lighting term. Since the height map is stored in the alpha channel of the normal map, we already have the normal map sample available to us. These diffuse and ambient terms are then used to modulate the color map sample from our final intersection point. In the place of this simple lighting model, you could use the offset texture coordinates to sample a normal map, gloss map or whatever other textures are needed to implement your favorite lighting model.
```
float3 vAmbient = vFinalColor.rgb * 0.1f;
float3 vDiffuse = vFinalColor.rgb * max( 0.0f, dot( L, vFinalNormal.xyz ) ) * 0.5f;
vFinalColor.rgb = vAmbient + vDiffuse;
OUT.color = vFinalColor;
```

Now that we have seen parallax occlusion mapping at work, let's consider some of the parameters that are important to the visual quality and the speed of the algorithm.
## Algorithm Metrics

The algorithm as presented in the demonstration program's effect file runs at faster than the 60 Hz refresh rate of my laptop with an Geforce 8600M GT at a screen resolution of 640x480 with the minimum and maximum number of samples set to 4 and 20, respectively. Of course this will vary by machine, but it will serve as a good metric to base performance characteristics on since we know that the algorithm is pixel shader bound. The algorithm is implemented using shader model 3.0 and later constructs - specifically it uses dynamic branching in the pixel shader to reduce the number of unnecessary loops after the surface intersection has already been found. Thus relatively modern hardware is needed to run this effect in hardware. Even with newer hardware, the algorithm is pixel shader intensive. Each iteration of the dynamic loop that does not find the intersection requires a texture lookup along with all of the ALU and logical instructions used to test if the intersection has occurred. Considering that the sample images were generated with a minimum sample count of 4 and a maximum sample count of 20, you can see that the number of times that the loop is performed to find the intersection is going to be the most performance critical parameter. With this in mind, we should develop some methodology for determining how many samples are required for an acceptable image quality. Figure 6 compares images generated with 20 and then 6 maximum samples respectively.*fHeightMapScale*in the sample effect file. If you imagine a 1-meter by 1-meter square (in world space coordinates), then the height-map scale is how deep of a simulated volume we are trying to represent. For example, if the height-map scale is 0.04, then our 1x1 square would have a potential depth of 0.04 meters. Figure 7 shows two images generated with a scale height of 0.1 and 0.4 with the same sampling rates (20 samples maximum).

*fParallaxLimit*in the pixel shader, is a measure of the possible intersection test travel in texture units (the xy-plane in tangent space). It is shorter for straight on views and longer for edge on views, which is what we want to base our number of samples on anyways. For example, if the parallax limit is 0.5 then a 256x256 height-map should sample, at most, 128 texels. This sampling method will provide the best quality results, but will run slower due to the larger number of iterations. Whatever sampling algorithm is used, it should be chosen to provide the minimum number of samples that provides acceptable image quality. Another consideration should be given to how large an object is going to appear on screen. If you are using parallax occlusion mapping on an object that takes up 80% of the frame buffer's pixels, then it will be much more prohibitive than an object that is going to take 20% of the screen. So even if your target hardware can't handle full screen parallax occlusion mapping, you could still use it for smaller objects.