Upcoming Events
Southwest Gaming Expo
11/20 - 11/22 @ Dallas, TX

Workshop on Network and Systems Support for Games (NetGames 2009)
11/23 - 11/25 @ Paris, France

ICIDS 2009 Interactive Storytelling
12/9 - 12/11 @ Guimarães, Portugal

Global Game Jam
1/29 - 1/31  

More events...


Quick Stats
6249 people currently visiting GDNet.
2341 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!



Link to us

Link to us

  Intel sponsors gamedev.net search:   

A Closer Look At Parallax Occlusion Mapping


Algorithm Metrics

The algorithm as presented in the sample effect file requires approximately 11.98 seconds to generate a frame at a screen resolution of 640x480 with the reference rasterizer with the minimum and maximum number of samples set to 8 and 50, respectively. Of course this will vary by machine, but it will serve as a good metric to base performance characteristics on since we know that the algorithm is pixel shader bound.

The algorithm is implemented using shader model 3.0 constructs – specifically it uses dynamic branching in the pixel shader to reduce the number of unnecessary loops after the intersection has already been found. Thus relatively modern hardware is needed to run this effect in hardware. Even with newer hardware, the algorithm is very pixel shader intensive. Each iteration of the dynamic loop that does not find the intersection requires a texture lookup along with 19 ALU instructions, and the final intersecting iteration requires 2 texture lookups and 21 ALU instructions.

Considering that the sample images were generated with a minimum sample count of 8 and a maximum sample count of 60, you can see that the number of loops performed is on average going to be the most performance intensive part of the loop. With this in mind, we should develop some methodology for determining how many samples are required for an acceptable image quality. Figure 5 compares images generated with 60 and then 40 maximum samples respectively.


Figure 5:  A 60-sample image (left) and a 40-sample image (right)

As you can see, there are aliasing artifacts along the top of the 40-sample image where the height map makes any sharp transitions (you may have to soon in on the image to see clearly). Even so, the parts of the image that are not such high frequency still look acceptable. If you will be using low frequency textures, you may be able to significantly reduce your sampling rate without any visual impact.

Another very important parameter that must be taken into consideration is the height-map scale, named fHeightMapScale in the sample effect file. If you imagine a 1-meter by 1-meter square (in world space coordinates), then the height-map scale is how deep of a simulated volume we are trying to represent. For example, if the height-map scale is 0.04, then our 1x1 square would have a potential depth of 0.04 meters. Figure 6 shows two images with a scale height of 0.04 and 0.08 with the same sampling rates.


Figure 6:  A 0.04 height-map scale image (left) and a 0.08 height map scale image (right)

It is easy to see the dramatic amount of occlusion caused by the increased scale height. Also notice toward the top of the image that the aliasing artifacts are back – even though the sampling rates are the same. With this in mind, you can see that the height scale also determines how 'sharp' the features are with respect to the eye vector. The taller the features are, the harder they will be to detect intersections with the eye vector. This means that we would need more samples per pixel to obtain similar image quality if the height scale is larger. So a smaller height scale is "a good thing".

In addition, we also have to consider how the algorithm will react when viewing polygonal surfaces nearly edge on. Our current algorithm uses a maximum of 60 samples to determine where the intersections are. This is already a prohibitive number of instructions to run, but the image quality is going to be horrible as well. Here's why. If your height-map is 256x256, and you view our 1m x 1m square from the edge on, then in the worst case you can have a single screen pixel be required to test 256 texels for intersections before it finds an intersection. We would need 8 times more samples than our maximum sampling rate to get an accurate intersection point!  Figure 7 shows an edge on image generated with 50 samples and 0.04 height-map scale.


Figure 7: A 60-sample 0.04 height-map scale image from an oblique angle

Mip-mapping would help, but each level of mip-map will reduce the resolution of the height-map and introduce swimming artifacts. Care must be taken to reduce the situations where an object would be viewed edge on, or to switch to a constant time algorithm like bump mapping at sharp angles.

The ideal sampling situation would be to have one sample for each texel that the eye vector could possibly pass through during the intersection test. So a straight on view would only require a single sample, and an edge on view would require as many samples as there are texels in line with the pixel (up to a maximum of the number of texels per edge).

This is actually information that is already available to us in the pixel shader. Our maximum parallax offset vector length, named fParallaxLimit in the pixel shader, is a measure of the possible intersection test travel in texture units (the xy-plane in tangent space). It is shorter for straight on views and longer for edge on views, which is what we want to base our number of samples on anyways. For example, if the parallax limit is 0.5 then a 256x256 height-map should sample, at most, 128 texels. This idea has been implemented and commented out in to the pixel shader in the supplied effect file for you to experiment with. This sampling method will provide the best quality results, but will run slower due to the larger number of iterations.

Whatever sampling algorithm is used, it should be chosen to provide the minimum number of samples that provides acceptable image quality.

Another consideration should be given to how large an object is going to appear on screen. If you are using parallax occlusion mapping on an object that takes up 80% of the frame buffer's pixels, then it will be much more prohibitive than an object that is going to take 20% of the screen. So even if your target hardware can't handle full screen parallax occlusion mapping, you could still use it for smaller objects.

Conclusion

I decided to write this article to provide some insight into the parallax occlusion mapping algorithm. Hopefully it is easy to understand and will provide some help in implementing the basic algorithm in addition to giving some hints about the performance vs. quality tradeoff that must be made. I think that the next advance in this algorithm is probably going to be making it more efficient, most likely with either a better sampling rate metric, or with a data structure built into the texture data to accelerate the searching process. If you feel that this document could be improved or have questions or comments on it please feel free to contact me as 'Jason Z' on the GameDev.net forums or you could also PM me.

About the Author

Jason Zink is an electrical engineer currently working in the automotive industry. He is currently working towards a M.S. in Computer Science, and has a received a B.S. in Electrical Engineering. He has been writing software for about 8 years in various fields such as industrial controls, embedded computing, business applications, automotive communication systems, and most recently game development with a great interest in graphics programming.





Contents
  Introduction
  Implementing Parallax Occlusion Mapping
  Algorithm Metrics

  Source code
  Printable version
  Discuss this article