Graphics and GPU Programming

Introduction
This article describes how to implement a very realistic shadow effect using nVIDIA's Cg programming language and OpenGL.

Although the focus of the article is on Cg, the shadowing algorithm is sufficiently complex to require some detailed explanations at the beginning. Additional shadowing references are listed at the end of the article.

The article will assume that the user has a modicum of familiarity with basic Cg. Here is a good introductory article on Cg that you may wish to peruse first.

The simplest way to render a shadow of a model is to take all the polygons that make that model and project them onto a surface along ray lines starting from the light [1]. Here is what that would look like:

The shadow consists of many polygons patched together. Each such polygon is a projection of a face of the original model, along light rays, onto the wall.

Although this technique is fast and reasonably effective, it has many limitations. The most important limitation is that shadows can only be rendered if they fall on flat surfaces (for example, a wall). If you want to project a shadow onto a sphere or some other non-trivial object, this technique falls short. An immediate corollary of this is that objects cannot self-shadow if they are anything but trivial, flat geometrical shapes.

The shadow volume [2] is a more computationally expensive but far more accurate technique for rendering shadows that are cast on any object or scene, no matter how complex or irregular.

At a high level, a shadow volume is an enclosed area of space that looks somewhat like a cone or a pyramid. The tip of the shadow volume is the light source; the faces of the shadow volume are determined by the outline of the object that is casting the shadow.

Any part of the scene that falls inside the shadow volume is shadowed; any part of the scene that falls outside the shadow volume is lit.

Below is what a shadow volume rendered scene would look like. Notice the self-shadowing of the shield on the body; also note that, while the surrounding scene happens to be a flat polygon, it could be any arbitrary shape:

In order to render the shadow volume, we must first determine the outline of model from the perspective of the light. Intuitively, for a given model, the outline will depend on the position of the light source relative to that object. Think of the model as consisting of many adjacent faces (polygons). Some of these polygons will be visible from the light; some other polygons will not be visible. Any edge that is shared between a visible and an invisible polygon is part of the shadow volume outline:

The visibility test for a face is performed simply by computing the dot product (angle) between the face normal and the light direction to that face: if the dot product is negative, the face is invisible, otherwise the face is visible from the light.

Once we have the complete set of outline segments, for each segment in the outline, extrude (push out) the extremities of that segment along the light direction to produce the shadow volume faces:

Take a moment to think about the diagrams above, about the definition of an outline segment and the definition of a shadow volume face.

At this point we have conceptualized the notion of a shadow volume given a model and a light. We can now think about how to determine what parts of the scene are lit or shadowed relative to this shadow volume. For simplicity, we reduce the problem to two dimensions:

For any point in the scene, if the ray connecting the eye to that point intersects the shadow volume an even number of times, then that point is in the light; otherwise, the point is in the shadow.

The shadow volume algorithm described above requires that the model be a closed object: it can be concave or convex, but it must not have "holes" (missing faces). If there is a missing face in the object, then a shadow volume outline segment might be skipped (since outline segments are defined as a shared edges between two faces).

Z-Pass Algorithm
In order to implement the shadow volume algorithm described above, we make use of the GPU's stencil buffer [4]:

1. Render the scene (walls etc.) with the lights turned off. This renders the color buffer as if it is all in shadow. It also fills the depth buffer with the depth values for all points in the scene.
2. Turn off depth buffer and color buffer writes.
3. Render all the shadow volume faces whose normals face towards the eye (a front face). Increment the stencil buffer bit for each point that passes the depth test. The depth and color buffers are unaffected.
4. Render all the shadow volume faces whose normals face away from the eye (a back face). Decrement the stencil buffer bit for each point that passes the depth test. The depth and color buffers are unaffected
5. Turn on the depth buffer and color buffer writes.
6. Render the scene again with the lights turned on and only where the stencil buffer is zero. This will overwrite all (previously) shadowed points in the scene with lit points wherever the scene point lies outside the shadow volume (the number of intersections from the eye ray is even).
Note that, in steps 3 and 4, the stencil buffer operation was performed only for those points in the shadow volume faces that passed the depth test. For this reason, the algorithm above is also called the "Z-Pass" algorithm.

The steps above might seem unintuitive, especially steps 3 and 4. The key to understanding them lies in thinking about a light ray starting from the eye, going to some point in the scene (whose shadowed or lit status we want to determine). This ray pierces the shadow volume faces: if it pierces both a front and a back face, then it intersects the shadow volume an even number of times, so the point is lit; if it pierces only a front face but not a back face (or vice versa), then it intersects the volume an odd number of times, so the point is in shadow. The stencil buffer operations keep track of these intersections: the "add 1" and "subtract 1" cancel each other out for points that are lit and do not cancel each other out for points that are shadowed.

Z-Fail Algorithm
The Z-Pass algorithm works well so long as the eye is outside the area enclosed by the shadow volume. To see this, imagine moving the eye along the ray in the diagram above until it is inside the shadow volume and immediately in front of the shadowed scene. In this situation, the shadowed scene will be rendered as if it is lit! The reason is that the shadow volume face that used to face the eye was clipped (since it is behind the eye now), so the stencil buffer values are no longer computed correctly [4].

In order to address this shortcoming, we can use a slightly different variant of the shadow volume algorithm called the "Z-Fail" algorithm (also known as "Carmack's Reverse") [5]. The Z-Fail algorithm is identical to the Z-Pass algorithm except for steps 3 and 4, which are different:

1. Render all the shadow volume faces whose normals face away the eye (a back face). Increment the stencil buffer bit for each point that fails the depth test. The depth and color buffers are unaffected.
2. Render all the shadow volume faces whose normals face towards from the eye (a front face). Decrement the stencil buffer bit for each point that fails the depth test. The depth and color buffers are unaffected
Intuitively, rather than computing the lit/shadowed status of points in the scene going from the eye towards the back of the scene (the Z-Pass algorithm), we now compute the status going from the back of the scene towards the eye (the Z-Fail algorithm). This approach is immune to the situation described above where shadow volume faces are clipped by the near view frustum plane. It is, however, still possible that the far view frustum plane can clip shadow volume faces, but this situation is far less likely to occur or can be easily averted by setting the far view frustum plane at "infinity" (this optimization is beyond the scope of this paper, but can be seen in [6] and in this article's accompanying source code).

The Z-Fail algorithm requires that the shadow volume be closed (additional polygons must be drawn to fill in the area defined by the outline segments, both at the "top" of the shadow volume and at the "bottom"). This means that more polygons are drawn on the screen for the Z-Fail algorithm than for the Z-Pass algorithm. For this reason, it is generally a good idea to use the Z-Pass algorithm whenever possible (for speed) and, in situations where the eye might enter the shadow volume area, use the Z-Fail algorithm instead.

Here is what the OpenGL code looks like for the Z-Fail algorithm:

// store current OpenGL state glPushAttrib(GL_DEPTH_BUFFER_BIT | GL_LIGHTING_BIT | GL_STENCIL_BUFFER_BIT); // draw the model with the light disabled glDisable(/* light handle */); model.draw(); glEnable(/* light handle */); // store current OpenGL state glPushAttrib(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT | GL_POLYGON_BIT | GL_STENCIL_BUFFER_BIT); glColorMask(0, 0, 0, 0); // do not write to the color buffer glDepthMask(0); // do not write to the depth (Z) buffer glEnable(GL_CULL_FACE); // cull faces (back or front) glEnable(GL_STENCIL_TEST); // enable stencil testing // set the reference stencil value to 0 glStencilFunc(GL_ALWAYS, 0, ~0); // increment the stencil value on Z fail glStencilOp(GL_KEEP, GL_INCR, GL_KEEP); // draw only the back faces of the shadow volume glCullFace(GL_FRONT); model.drawShadowVolume(light); // decrement the stencil value on Z fail glStencilOp(GL_KEEP, GL_DECR, GL_KEEP); // draw only the front faces of the shadow volume glCullFace(GL_BACK); model.drawShadowVolume(light); // restore OpenGL state glPopAttrib(); // re-draw the model with the light enabled only where // it has previously been drawn glDepthFunc(GL_EQUAL); // update the color only where the stencil value is 0 glEnable(GL_STENCIL_TEST); glStencilFunc(GL_EQUAL, 0, ~0); glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP); model.draw(); // restore OpenGL state glPopAttrib();
Cg Implementation

Model and Environment Setup
Cg programs, whether they are vertex or fragment programs, are executed once per graphics primitive (vertex or, respectively, fragment). It is impossible to generate new primitives from within a Cg program!

The shadow volume algorithms described above require that new vertices and faces be generated for the shadow volume faces in addition to the vertices and faces of the original model.

In order to take advantage of the GPU, we would like to avoid having to compute these additional vertices and faces on the CPU: it would be preferable to do all this work directly on the GPU. Since Cg cannot generate new vertices and faces, we must come up with a clever way of generating them on the CPU with minimal amount of work.

It turns out that there is such a way, which is also (fortuitously) ideally suited for the Z-Fail shadow volume algorithm!

Imagine that, for each edge in the original model, we generate a new quadrilateral that lies along that edge and has zero width (depicted below with red and yellow). The quadrilateral has a different normal for every vertex: two of the vertices take their normals from one of the polygons incident on that edge (green vectors), and the other two vertices take their normals from the other polygon (blue vectors). Here is a visual representation:

Any regular polygon that makes up the original model has the same normal vector at all of its vertices (the polygon normal).

In the Cg vertex program, we compute the dot product (angle) between the current vertex normal and the light direction. If this dot product is negative (the normal is facing away from the light), then we extrude the vertex along the light direction; otherwise, we keep the vertex in its specified location.

Since we only extrude those vertices that face away from the light, this means that any edge quadrilateral defined above will be drawn with non-zero width only when one incident polygon faces away from the light and the other incident polygon faces towards the light (this is the definition of an outline segment!):

Furthermore, all regular polygons that face away from the light will be also extruded, thereby closing the shadow volume on the bottom, whereas all other regular polygons that face towards the light will be kept in their specified locations, thereby closing the shadow volume on the top.

The only work that needs to be performed on the CPU is, therefore, a simple insertion of edge quadrilaterals for every edge; all other computations are done on the GPU, just like we wanted!

Cg Source
Here is the Cg vertex program for manipulating the model (and its implicit shadow volume faces):

// the POSITION and COLOR bindings are // (varying) outputs from this Cg program struct t_out { float4 position: POSITION; float4 color: COLOR; }; // the POSITION and NORMAL bindings are // (varying) inputs to this Cg program t_out main(float4 vertexPosition: POSITION, float4 vertexNormal: NORMAL, uniform float4x4 modelViewProj, uniform float4 lightPosition, uniform float4 lightAmbientColor, uniform float4 lightDiffuseColor, uniform float4 eyePosition, uniform float4 ambientColor, uniform float4 diffuseColor, uniform float4 emissiveColor, uniform float specularShininess, uniform float extrusionFactor) { t_out result; // Compute the diffuse term float4 lightDirection = normalize(lightPosition - vertexPosition); float diffuseLight = max(dot(vertexNormal, lightDirection), 0); // Assume specular color is white float4 specularColor = float4(1, 1, 1, 1); float specularLight = 0; // Compute the specular term if (specularShininess > 0) { float4 eyeDirection = normalize(eyePosition - vertexPosition); float4 halfDirection = normalize(lightDirection + eyeDirection); specularLight = pow( max(dot(vertexNormal, halfDirection), 0), specularShininess); if (diffuseLight <= 0) { specularLight = 0; } } result.color = emissiveColor + ambientColor * lightAmbientColor + diffuseLight * diffuseColor * lightDiffuseColor + specularLight * specularColor; float4 position = vertexPosition; if (diffuseLight <= 0) { if (extrusionFactor > 0) { position = vertexPosition + (vertexPosition - lightPosition) * extrusionFactor; // Place the extruded vector at infinity // (homogenous coordinate = 0) position.w = 0; } } result.position = mul(modelViewProj, position); return result; } The relevant parts of the Cg program are in bold:

• Compute the dot product (angle) between the vertex normal and the light direction:

float diffuseLight = max(dot(vertexNormal, lightDirection), 0);
• If the vertex normal faces away from the light (diffuseLight <= 0) and we are rendering the model during the stencil buffer phase of the algorithm (i.e. we want to extrude the shadow volume face vertices: extrusionFactor > 0), then extrude the vertex along the light direction:

position = vertexPosition + (vertexPosition - lightPosition) * extrusionFactor; The remainder of the Cg code computes the color at the current vertex. This is necessary because Cg vertex programs replace both the transformation and the lighting part of the OpenGL pipeline, so we must explicitly compute the color for Gouraud interpolation.

Appendices

Software and Hardware Requirements

Software Environment
The code that accompanies the article has been compiled and tested only on Microsoft Windows 2000 Professional SP4 and Microsoft Visual Studio .NET (version 7.0). The code includes the necessary Visual Studio project files and configuration.

You will need the following 3[sup]rd[/sup] party libraries:

1. OpenGL Utility Toolkit (GLUT) 3.7.6 or better [7]
2. Corona image library 1.0.1 or better [8]
3. nVidia Cg toolkit 1.1 or better

Hardware Environment
The code has been tested on a GeForce4 Ti 4200 video card. The code should run on a GeForce3 (or better) or a Radeon 9500 (or better) card.

In general, the Cg OpenGL code in this article requires a GPU with support for either ARB_vertex_program (GeForce3 (or better) or Radeon 8500 (or better) families) or NV_vertex_program (GeForce3 (or better) family).

Execution
To execute the pre-compiled binaries, you need to use the following command line:

The first parameter specifies the 3D Model to render on the screen, and the second parameter specifies the Cg vertex program to use in order to render the model and its shadow.

Once the application window appears on the screen, you can right-click anywhere in it and display a pop-up menu with three options in it:

1. Control scene: moving the mouse will change the position of the scene.
2. Control model: moving the mouse will change the position of the model.
3. Control light: moving the mouse will change the position of the light.
The mouse movements change the position of the selected elements as follows:

1. Left click + move: quaternion rotation around the origin.
2. CTRL + left click + move: translate along the X- or Y-axes.
3. SHIFT + left click + move: translate along the Z-axis.

Implementation
As mentioned during the article, the accompanying code has a few additional features that go beyond the scope of the article.

The triangle mesh rendered on the screen is a model taken from [6] and converted using Milkshape 3D [9].

The code is documented and automated documentation is produced using Doxygen [10].

References
[1] Jim Blinn, "Me and My (Fake) Shadow," IEEE Computer Graphics and Applications, January 1988. Reprinted in Jim Blinn's Corner: A Trip Down the Graphics Pipeline, 1996.
[2] Frank Crow, "Shadow Algorithms for Computer Graphics," Computer Graphics, 11(2), pp. 442-448, summer 1977.
[3] Paul Diefenbach, Multi-pass Pipeline Rendering: Interaction and Realism through Hardware Provisions, Ph.D. thesis, University of Pennsylvania, tech report MS-CIS-96-26, 1996.
[4] Tim Heidmann, "Real Shadows Real Time", IRIS Universe, Number 18, 1991, pp. 28-31.
[5] John Carmack, unpublished correspondence with Mark Kilgard, early 2000.
[6] Cass Everitt and Mark J. Kilgard, Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering, March 12, 2002
[7] Nate Robbins, Windows port of Mark Kilgard's "OpenGL Utility Toolkit" (GLUT)
[9] chUmbaLum sOft, "Milkshape 3D"
[10] Dimitri van Heesch, "Doxygen"

Report Article

## User Feedback

There are no comments to display.

## Create an account

Register a new account

• 0
• 2
• 1
• 6
• 0

• 9
• 9
• 14
• 12
• 10