• Advertisement

Myopic Rhino

GDNet Emeritus
  • Content count

  • Joined

  • Last visited

Everything posted by Myopic Rhino

  1. I'm dreaming too high?

    I don't know of any companies that would even entertain letting them pitch you an idea, unless you're willing to fund it. If they rejected it, but happened to develop something similar, they'd be opening themselves for a lawsuit.
  2. Clearly, you just need a better CPU
  3. New to the forum, Hello everyone!

  4. Quaternion Powers

    Version 1.2, February, 2003 Updates 1.2 A minor correction with the formula of converting from Quat to Axis. The scale is missing a square root. Thanks to Shi for pointing that out. From version 1.0 - 1.1 The norm of a quaternion should be the square root of the q.q. The mistake was brought to my attention by several kind readers and upon checking the definition of the Euclidean properties for complex numbers, I realize the norm property [bquote] || u+v || [/bquote] is violated for the previous definition of the magnitude. The code in the samples are updated as well. Foreword To me, the term 'Quaternion' sounds out of this world, like some term from quantum theory about dark matter, having dark secret powers. If you, too, are enthralled by this dark power, this article will bring enlightenment (I hope). The article will show you how to do rotations using quaternions, and bring you closer to understanding quaternions (and their powers). If you do spot a mistake please email me at robin@cyberversion.com. Also if you intend to put this on your site, please send me a mail. I like to know where this ends up. Why use Quaternions? To answer the question, let's first discuss some common orientation implementations. Euler representation This is by far the simplest method to implement orientation. For each axis, there is a value specifying the rotation around the axis. Therefore, we have 3 variables [bquote] x, y, z [/bquote] that vary between 0 and 360 degrees (or 0 - 2pi). They are the roll, pitch, and yaw (or pitch, roll, and yaw - whatever) representation. Orientation is obtained by multiplying the 3 rotation matrices generated from the 3 angles together (in a specific order that you define). Note: The rotations are specified with respect to the global coordinate axis frame. This means the first rotation does not change the axis of rotation for the second and third rotations. This causes a situation known as gimbal lock, which I will discuss later. Angle Axis representation This implementation method is better than the Euler representation as it avoids the gimbal lock problem. The representation consists of a unit vector representing an arbitrary axis of rotation, and another variable (0 - 360) representing the rotation around the vector: [bquote] x, y, z [/bquote] Why are these representations bad? Gimbal Lock As rotations in the Euler representation are done with respect to the global axis, a rotation in one axis could 'override' a rotation in another, making you lose a degree of freedom. This is gimbal lock. Say, if the rotation in the Y axis rotates a vector (parallel to the x axis) so that the vector is parallel to the z axis. Then, any rotations in the z axis would have no effect on the vector. Later, I will show you an example of gimbal lock and how you can use quaternions to overcome it. Interpolation Problems Though the angle axis representation does not suffer from gimbal lock, there are problems when you need to interpolate between two rotations. The calculated interpolated orientations may not be smooth, so you will get jerky rotation movements. Euler representation suffers from this problem as well. Let's get started Before we begin, let's establish some assumptions I'll be making. I hate the way many articles leave this important section out, causing a great deal of confusion when it comes to the mathematics. Coordinate System - This article assumes a right hand coordinate system, like OpenGL. If you are using a left handed coordinate system like Direct3D, you may need to transpose the matrices. Note that the Direct3D samples have a quaternion library already, though I recommend you check through their implementation before using it. Rotation Order - The sequence of rotations in the Euler representation is X, then Y, and then Z. In matrix form: [bquote] RotX * RotY * RotZ Very Important [/bquote] Matrix - Matrices are in column major format, like they are in OpenGL. [bquote] Example[nbsp][[nbsp]0[nbsp][nbsp]4[nbsp][nbsp]8[nbsp][nbsp]12 [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]1[nbsp][nbsp]5[nbsp][nbsp]9[nbsp][nbsp]13 [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2[nbsp][nbsp]6[nbsp][nbsp]10[nbsp]14 [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]3[nbsp][nbsp]7[nbsp][nbsp]11[nbsp]15[nbsp]] [/bquote] Vectors and Points - Implemented as a 4x1 matrix so applying a transformation is of the order [bquote] Rotation[nbsp]Matrix*[nbsp][nbsp][[nbsp]vx [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]vy [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]vz [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]1[nbsp][nbsp]][nbsp][nbsp][/bquote] This does not imply that I prefer OpenGL over Direct3D. It just happened that I learned OpenGL first, and so my quaternion knowledge was gained in OpenGL. Note: If you specify rotations in another order, certain quaternion functions will be implemented differently, especially those that deal with Euler representation. What is a Quaternion? A complex number is an imaginary number that is defined in terms of i, the imaginary number, which is defined such that i * i = -1. A quaternion is an extension of the complex number. Instead of just i, we have three numbers that are all square roots of -1, denoted by i, j, and k. This means that [bquote]j * j = -1 k * k = -1[/bquote] So a quaternion can be represented as [bquote]q = w + xi + yj + zk[/bquote] where w is a real number, and x, y, and z are complex numbers. Another common representation is [bquote]q=[ w,v ][/bquote] where v = (x, y, z) is called a "vector" and w is called a "scalar". Although the v is called a vector, don't think of it as a typical 3 dimensional vector. It is a vector in 4D space, which is totally unintuitive to visualize. Identity Quaternions Unlike vectors, there are two identity quaternions. The multiplication identity quaternion is [bquote]q= [1,(0, 0, 0)][/bquote] So any quaternion multiplied with this identity quaternion will not be changed. The addition identity quaternion (which we do not use) is [bquote]q= [0,(0, 0, 0)][/bquote] Using quaternions as orientations The first thing I want to point out is that quaternions are not vectors, so please don't use your preconceived vector mathematics on what I'm going to show. This is going to get very mathematical, so please bear with me. We need to first define the magnitude of a quaternion. [bquote]|| q ||= Norm(q) =sqrt(w2 + x2 + y2 + z2)[/bquote] A unit quaternion has the following property [bquote]w2 + x2 + y2 + z2=1[/bquote] So to normalize a quaternion q, we do [bquote]q = q / || q || = q / sqrt(w2 + x2 + y2 + z2)[/bquote] What is so special about this unit quaternion is that it represents an orientation in 3D space. So you can use a unit quaternion to represent an orientation instead of the two methods discussed previously. To use them as orientations, you will need methods to convert them to other representations (e.g. matrices) and back, which will be discussed soon. Visualizing a unit quaternion You can visualize unit quaternions as a rotation in 4D space where the (x,y,z) components form the arbitrary axis and the w forms the angle of rotation. All the unit quaternions form a sphere of unit length in the 4D space. Again, this is not very intuitive but what I'm getting at is that you can get a 180 degree rotation of a quaternion by simply inverting the scalar (w) component. Note: Only unit quaternions can be used for representing orientations. All discussions from here on will assume unit quaternions. Conversion from Quaternions To be able to use quaternions effectively, you will eventually need to convert them to some other representation. You cannot interpret keyboard presses as quaternions, can you? Well, not yet. Quaternion to Matrix Since OpenGL and Direct3D allow rotations to be specified as matrices, this is probably the most important conversion function, since homogenous matrices are the standard 3D representations. The equivalent rotation matrix representing a quaternion is [bquote] Matrix[nbsp]=[nbsp][nbsp][[nbsp]w2[nbsp]+[nbsp]x2[nbsp]-[nbsp]y2[nbsp]-[nbsp]z2[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xy[nbsp]-[nbsp]2wz[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xz[nbsp]+[nbsp]2wy [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xy[nbsp]+[nbsp]2wz[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]w2[nbsp]-[nbsp]x2[nbsp]+[nbsp]y2[nbsp]-[nbsp]z2[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2yz[nbsp]-[nbsp]2wx [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xz[nbsp]-[nbsp]2wy[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2yz[nbsp]+[nbsp]2wx[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]w2[nbsp]-[nbsp]x2[nbsp]-[nbsp]y2[nbsp]+[nbsp]z2[nbsp]] [/bquote] Using the property of unit quaternions that w2 + x2 + y2 + z2 = 1, we can reduce the matrix to [bquote] Matrix[nbsp]=[nbsp][nbsp][[nbsp]1[nbsp]-[nbsp]2y2[nbsp]-[nbsp]2z2[nbsp][nbsp][nbsp][nbsp]2xy[nbsp]-[nbsp]2wz[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xz[nbsp]+[nbsp]2wy [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xy[nbsp]+[nbsp]2wz[nbsp][nbsp][nbsp][nbsp]1[nbsp]-[nbsp]2x2[nbsp]-[nbsp]2z2[nbsp][nbsp][nbsp][nbsp]2yz[nbsp]-[nbsp]2wx [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2xz[nbsp]-[nbsp]2wy[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]2yz[nbsp]+[nbsp]2wx[nbsp][nbsp][nbsp][nbsp]1[nbsp]-[nbsp]2x2[nbsp]-[nbsp]2y2[nbsp]] [/bquote] Quaternion to Axis Angle To change a quaternion to a rotation around an arbitrary axis in 3D space, we do the following: [bquote] If[nbsp]the[nbsp]axis[nbsp]of[nbsp]rotation[nbsp]is[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp](ax,[nbsp]ay,[nbsp]az) and[nbsp]the[nbsp]angle[nbsp]is[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]theta[nbsp](radians) then[nbsp]the[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]angle=[nbsp]2[nbsp]*[nbsp]acos(w) [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]ax=[nbsp]x[nbsp]/[nbsp]scale [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]ay=[nbsp]y[nbsp]/[nbsp]scale [nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]az=[nbsp]z[nbsp]/[nbsp]scale where[nbsp]scale[nbsp]=[nbsp]sqrt[nbsp](x2[nbsp]+[nbsp]y2[nbsp]+[nbsp]z2) [/bquote] Another variation I have seen is that the scale = sin(acos(w)). They may be equivalent, though I didn't try to find the mathematical relationship behind them. Anyway if the scale is 0, it means there is no rotation so unless you do something, the axis will be infinite. So whenever the scale is 0, just set the rotation axis to any unit vector with a rotation angle of 0. A Simple Example In case you are getting confused with what I'm getting at, I will show you a simple example here. Say the camera orientation is represented as Euler angles. Then, in the rendering loop, we position the camera using [bquote] RotateX * RotateY * RotateZ * Translate [/bquote] where each component is a 4x4 matrix. So if we are using a unit quaternion to represent the camera orientation, we have to convert the quaternion to a matrix first [bquote] Rotate (from Quaternion) * Translate [/bquote] A more specific example in OpenGL: Euler Quaternion glRotatef( angleX, 1, 0, 0) glRotatef( angleY, 0, 1, 0) glRotatef( angleZ, 0, 0, 1) // translate // convert Euler to quaternion // convert quaternion to axis angle glRotate(theta, ax, ay, az) // translate The above implementations are equivalent. The point I'm trying to get across is that using quaternions for orientation is the same as using Euler or Axis angle representation and that they can be interchanged through the conversion functions I've described. Note that the above quaternion representation will also incur gimbal lock like the Euler method. Of course, you do not know how to make the rotation to be a quaternion in the first place but we will get to that shortly. Note: If you are using Direct3D or OpenGL, you may not have to deal with matrices directly, but matrix concatenation is something that the API does, so it's worth learning about them. Multiplying Quaternions Since a unit quaternion represents an orientation in 3D space, the multiplication of two unit quaternions will result in another unit quaternion that represents the combined rotation. Amazing, but it's true. Given two unit quaternions [bquote]Q1=(w1, x1, y1, z1); Q2=(w2, x2, y2, z2);[/bquote] A combined rotation of unit two quaternions is achieved by [bquote]Q1 * Q2 =( w1.w2 - v1.v2, w1.v2 + w2.v1 + v1*v2)[/bquote] where [bquote]v1= (x1, y1, z1) v2 = (x2, y2, z2)[/bquote] and both . and * are the standard vector dot and cross product. However an optimization can be made by rearranging the terms to produce [bquote]w=w1w2 - x1x2 - y1y2 - z1z2 x = w1x2 + x1w2 + y1z2 - z1y2 y = w1y2 + y1w2 + z1x2 - x1z2 z = w1z2 + z1w2 + x1y2 - y1x2[/bquote] Of course, the resultant unit quaternion can be converted to other representations just like the two original unit quaternions. This is the real beauty of quaternions - the multiplication of two unit quaternions in 4D space solves gimbal lock because the unit quaternions lie on a sphere. Be aware that the order of multiplication is important. Quaternion multiplication is not commutative, meaning [bquote]q1 * q2 does not equal q2 * q1[/bquote] Note: Both quaternions must refer to the same coordinate axis. I made the mistake of combining two quaternions from different coordinate axes, and I had a very hard time wondering why the result quaternion fails in certain angles only. Conversion To Quaternions Now we learn how to convert other representations to quaternions. Although I do not use all the conversions in the sample program, there are times when you'll need them when you want to use quaternion orientation for more advanced stuff like inverse kinematics. Axis Angle to Quaternion A rotation around an arbitrary axis in 3D space can be converted to a quaternion as follows [bquote] If[nbsp]the[nbsp]axis[nbsp]of[nbsp]rotation[nbsp]is[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp](ax,[nbsp]ay,[nbsp]az)-[nbsp]must[nbsp]be[nbsp]a[nbsp]unit[nbsp]vector and[nbsp]the[nbsp]angle[nbsp]is[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]theta[nbsp](radians) [nbsp][nbsp][nbsp][nbsp]w[nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp]cos(theta/2) [nbsp][nbsp][nbsp][nbsp]x[nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp]ax[nbsp]*[nbsp]sin(theta/2) [nbsp][nbsp][nbsp][nbsp]y[nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp]ay[nbsp]*[nbsp]sin(theta/2) [nbsp][nbsp][nbsp][nbsp]z[nbsp][nbsp][nbsp]=[nbsp][nbsp][nbsp]az[nbsp]*[nbsp]sin(theta/2) [/bquote] The axis must first be normalized. If the axis is a zero vector (meaning there is no rotation), the quaternion should be set to the rotation identity quaternion. Euler to Quaternion Converting from Euler angles to a quaternion is slightly more tricky, as the order of operations must be correct. Since you can convert the Euler angles to three independent quaternions by setting the arbitrary axis to the coordinate axes, you can then multiply the three quaternions together to obtain the final quaternion. So if you have three Euler angles (a, b, c), then you can form three independent quaternions [bquote]Qx = [ cos(a/2), (sin(a/2), 0, 0)] Qy = [ cos(b/2), (0, sin(b/2), 0)] Qz = [ cos(c/2), (0, 0, sin(c/2))][/bquote] And the final quaternion is obtained by Qx * Qy * Qz. Demo - Avoiding Gimbal Lock Finally, we've reached what you all been waiting for: "How can quaternions avoid gimbal lock.?" The basic idea is Use a quaternion to represent the rotation. Generate a temporary quaternion for the change from the current orientation to the new orientation. PostMultiply the temp quaternion with the original quaternion. This results in a new orientation that combines both rotations. Convert the quaternion to a matrix and use matrix multiplication as normal. Firstly, I want to make a disclaimer regarding the sample code. The code is ugly and very poorly organized. But do remember, this is just a cut down version of my program when I was testing quaternions, and I'm not getting paid for this. There are two executable samples that I have included. The first program, CameraEuler.exe, is an example for camera implementation using Euler angles. The main concern should be the Main_Loop function in main.cpp. The main thing you should take note (in the while loop) is There are 3 angles I keep track of for rotation in the X, Y, and Z axis. With every key press, I adjust the corresponding rotation variable. In the while loop, I translate and then just convert the 3 Euler angles to rotation matrices and multiply them into the final transformation matrix. Use the up/down keys to rotate around the X axis, left/right to rotate around the Y axis and Insert/PageUp to rotate around the Z axis. This program suffers from gimbal lock. If you want to see it in action, rotate the camera so that the yaw is 90 deg. Then try rotating in the X and Z direction. See what happens. Now for the quaternion solution. The program is CameraQuat.exe and it is a slight modification of the previous program. The main point you should take note (in the while loop) is The orientation of the camera is a quaternion. There are 3 angles corresponding to the keypress. Note the angles are meant to be an on/off switch (not accumulative). I reset them inside the while loop. Of course this is not the best way to do it but as I said, it is a quick job. I convert the 3 angles to a temporary quaternion. I multiply the temporary quaternion to the camera quaternion to obtain the combined orientation. Note the order of multiplication. The camera rotation is then converted to the Axis Angle representation for transforming the final matrix. When a key is pressed, I generate a temporary quaternion corresponding to the key for a small rotation in that particular axis. I then multiply the temporary quaternion into the camera quaternion. This concatenation of rotations in 4D space will avoid gimbal lock. Try it and see for yourself. The camera quaternion has to be changed into either a matrix form or equivalence form so that you can concatenate it into a final transformation matrix. You have to do this for every quaternion you use, as 4D space and 3D space just don't mix. In the case of OpenGL, I just changed the quaternion to an Axis Angle representation and let the API do the rest. Although I did not use the global Euler angles for rotations in the second program, I have left them there as a guide for you to see the similar Euler rotations in the first program. Note the Euler angles will be incorrect if you rotate more than 1 axis (because it counts the keypress rather than getting the Euler angles from the camera quaternion). It is just a reference for you to see that when you rotate the yaw to 90 deg when the program starts, the gimbal lock problem is no more. Note: I don't recommend you use my math library as it is. Understand the quaternion and write your own. For your information, I am going to throw all of that away and rewrite it too. It is just too messy and ugly for my taste. What I did not show If you notice, I did not show how to convert from a Quaternion to the Euler angle. That's because I have yet to find a conversion that works perfectly. The only way I know is to obtain a matrix from the quaternion and try to extract the Euler angles from the matrix. However, as Euler to matrix conversions are a many-to-one relationship (due to sin and cos), I do not know how to get the reverse even using atan2. If anyone knows how to extract Euler angles from a matrix accurately, please do share with me. The other thing I did not show is the conversion of a matrix to a quaternion. I didn't think I needed this conversion as you can convert the Euler and Axis angle representation to quaternion straight without needing to throw them to a matrix. More you can do - SLERP If you think you are a quaternion master, think again. There is still more to learn about them. Remember I said something about why the Axis Angle representation is bad? Does the word 'interpolation' ring a bell? I don't have the time to write about interpolations using quaternions. This article has taken much longer than I had anticipated. I can give you the basic idea about SLERP (Spherical Linear Interpolation), which is basically generating a series of quaternions between two quaternion orientations (which you specify). The series of quaternions will result in smooth motion between the first and end quaternion (something which both the Euler and Axis Angle representation cannot achieve consistently). Final Words I hope this article can clear up any mystery behind the quaternion theory. A final word of caution again: Don't multiply two quaternions from different coordinate frames. Nothing but pain and hair loss will result from it. With your new found powers, I bid thee farewell. Take care, .. and watch your back.
  5. Author's note: Keep in mind that although this article is being published on GameDev.net in April 2003, it was written in April 2002, and some things have changed in the past year, notably the release of the OpenGL 1.4 specification, and the standardization of pixel and vertex shaders. I've chosen not to update the article to reflect these changes because I wanted to keep the text consistent with what was published in the book, and because they really make no difference as far as the the purpose of the article is concerned. This article originally appeared in the book Game Programming Tricks of the Trade, 2002, Premier Press. Many members of the GameDev.net community, including several GDNet staff members, contributed to the book, so you're encouraged to check it out. Trick #10 from Game Programming Tricks of the Trade, Premier Press Once you've been programming with OpenGL for Windows for a while, you'll probably notice something: the headers and libraries you're using are old. Dig around in the gl.h header, and you'll see this: #define GL_VERSION_1_1 1 his means that you're using OpenGL 1.1, which was released in 1996. In the world of graphics, that's ancient! If you've been paying attention, you know that the current OpenGL specification is at 1.3 (at least at the time of this writing). OpenGL 1.4 should be released later this year, with 2.0 following soon after. Obviously, you need to update your OpenGL headers and libraries to something more recent. As it turns out, the most recent headers and libraries for Windows correspond to ... OpenGL 1.1. That's right, the files you already have are the most recent ones available. This, of course, presents a problem. Although you can do some impressive things with OpenGL 1.1, to take full advantage of modern consumer graphics hardware, you're going to need functionality available through more recent versions, as well as features available through extensions (but we'll get to that in a bit). The question, then, is how to access newer features, when your headers and libraries are stuck at OpenGL 1.1. The purpose of this article is to answer that question. What You Will Learn In this article, I will: Explain in greater detail why you need to take some extra steps to use anything beyond OpenGL 1.1. Explain OpenGL's extension mechanism, and how it can be used to access OpenGL 1.2 and 1.3 functionality. Give you an overview of the new options available in OpenGL 1.2 and 1.3, as well as a look at some of the most useful extensions. Give you some tips for using extensions while ensuring that your game will run well on a wide range of systems. Provide a demo showing you how to use the techniques described. The Problem Headers and libraries. As I mentioned in the introduction, the latest version of the OpenGL headers and libraries available from Microsoft correspond to version 1.1. If you look around on the Internet, you may come across another OpenGL implementation for Windows created by Silicon Graphics. SGI's implementation also corresponds to OpenGL 1.1. Unfortunately, this implementation is no longer supported by SGI. In addition, the Microsoft implementation is based upon it, so you really gain nothing by using it. Where does that leave us? Well, there is reason to hope that someone will release up to date libraries. Although, to my knowledge, no one has committed to doing so, several parties have discussed it. Microsoft is the obvious candidate, and despite years of promising and not delivering, it appears that they have taken an interest in the recently proposed OpenGL 2.0. Whether or not that interest will lead to action remains to be seen, but given the large number of graphics workstations running Windows NT and Windows 2000, it's not beyond the realm of possibility. Besides Microsoft, there has apparently been discussion among the members of OpenGL's Architectural Review Board (ARB) to provide their own implementation of the headers and libraries. At present, though, this is still in the discussion stage, so it may be a while before we see anything come of it. The runtime. Most versions of Windows (the first release of Windows 95 being the exception) come with a 1.1 runtime. Fortunately, this isn't really as important as the other elements. All that the runtime does is guarantee a baseline level of functionality, and allow you to interface with the ICD. The ICD. This is the one area where you're okay. Most hardware vendors (including NVIDIA and ATI) have been keeping up with the latest OpenGL standard. For them to be able to advertise that their drivers are compliant with the OpenGL 1.3 standard, they have to support everything included in the 1.3 specification (though not necessarily in hardware). The cool thing about this is that the ICD contains the code to do everything in newer versions of OpenGL, and we can take advantage of that. The thing that's important to note here is that although the headers and libraries available don't directly allow you to access newer OpenGL features, the features do exist in the video card drivers. You just need to find a way to access those features in our code. We do that by using OpenGL's extension mechanism. OpenGL Extensions As you're aware, the graphics industry has been moving at an alarmingly rapid pace for many years now. Today, consumer-level video cards include features that were only available on professional video cards (costing thousands of dollars) a few years ago. Any viable graphics API has to take these advances into account, and provide some means to keep up with them. OpenGL does this through extensions. If a graphics vendor adds a new hardware feature that they want OpenGL programmers to be able to take advantage of, they simply need to add support for it in their ICD, and then provide developers with documentation about how to use the extension. This is oversimplifying a bit, but it's close enough for our purposes. As an OpenGL programmer, you can then access the extension through a common interface shared by all extensions. You'll learn how to do that in the "Using Extensions" section, but for now, let's look at how extensions are identified, and what they consist of. Extension Names Every OpenGL extension has a name by which it can be precisely and uniquely identified. This is important, because hardware vendors will frequently introduce extensions with similar functionality but very different semantics and usage. You need to be able to distinguish between them. For example, both NVIDIA and ATI provide extensions for programmable vertex and pixel shaders, but they bear little resemblance to each other. So, if you wanted to use pixel shaders in your program, it wouldn't be enough to find out if the hardware supported pixel shaders. You'd have to be able to specifically ask whether NVIDIA's or ATI's version is supported, and handle each appropriately. All OpenGL extensions use the following naming convention: PREFIX_extension_name The "PREFIX" is there to help avoid naming conflicts. It also helps identify the developer of the extension or, as in the case of EXT and ARB, its level of promotion. Table 1 lists most of the prefixes currently in use. The "extension_name" identifies the extension. Note that the name cannot contain any spaces. Some example extension names are ARB_multitexture, EXT_bgra, NV_vertex_program, and ATI_fragment_shader Table 1 - OpenGL Extension Prefixes Prefix Meaning/Vendor ARBExtension approved by OpenGL's Architectural Review Board (first introduced with OpenGL 1.2) EXT Extension agreed upon by more than one OpenGL vendor 3DFX 3dfx Interactive APPLE Apple Computer ATI ATI Technologies ATIX ATI Technologies (experimental) HP Hewlett-Packard INTEL Intel Corporation BM International Business Machines KTX Kinetix NV NVIDIA Corporation MESA http://www.mesa3d.org OML OpenML SGI Silicon Graphics SGIS Silicon Graphics (specialized) SGIX Silicon Graphics (experimental) SUN Sun Microsystems SUNX Sun Microsystems (experimental) WIN Microsoft What an Extension Includes You now know what an extension is, and how extensions are named. Next, let's turn our attention to the relevant components of an extension. There are four parts of an extension that you need to deal with. Name Strings Each extension defines a name string, which you can use to determine whether or not the OpenGL implementation supports it. By passing GL_EXTENSIONS to the glGetString() method, you can get a space-delimited buffer containing all the extension name strings supported by the implementation. Name strings are generally the name of the extension preceded by another prefix. For core OpenGL name strings, this is always GL_ (e.g. GL_EXT_texture_compression). When the name string is tied to a particular windows system, the prefix will reflect which system that is (e.g. Win32 uses WGL_). Functions Many (but not all) extensions introduce one or more new functions to OpenGL. To use these functions, you'll have to obtain their entry point, which requires that you know the name of the function. This process is described in detail in the "Using Extensions" section. The functions defined by the extension follow the naming convention used by the rest of OpenGL, namely glFunctionName(), with the addition of a suffix using the same letters as the extension name's prefix. For example, the NV_fence extension includes the functions glGetFencesNV(), glSetFenceNV(), glTestFenceNV(), and so on. Enumerants An extension may define one or more enumerants. In some extensions, these enumerants are intended for use in the new functions defined by the extension (which may be able to use existing enumerants as well). In other cases, they are intended for use in standard OpenGL functions, thereby adding new options to them. For example, the ARB_texture_env_add extension defines a new enumerant, GL_ADD. This enumerant can be passed as the params parameter of the various glTexEnv() functions when the pname parameter is GL_TEXTURE_ENV_MODE. The new enumerants follow the normal OpenGL naming convention (i.e. GL_WHATEVER), except that they are suffixed by the letters used in the extension name's prefix, such as GL_VERTEX_SOURCE_ATI. Using new enumerants is much simpler than using new functions. Usually, you will just need to include a header defining the enumerant, which you can get from your hardware vendor or from SGI. Alternately, you can define the enumerant yourself if you know the integer value it uses. This value can be obtained from the extension's documentation. Dependencies Very few extensions stand completely alone. Some require the presence of other extensions, while others take this a step further and modify or extend the usage of other extensions. When you begin using a new extension, you need to be sure to read the specification and understand the extension's dependencies. Speaking of documentation, you're probably wondering where you can get it, so let's talk about that next. Extension Documentation Although vendors may (and usually do) provide documentation for their extensions in many forms, there is one piece of documentation that is absolutely essential-- the specification. These are generally written as plain text files, and include a broad range of information about the extension, such as its name, version, number, dependencies, new functions and enumerants, issues, and modifications/additions to the OpenGL specification. The specifications are intended for use by developers of OpenGL hardware or ICDs, and as such, are of limited use to game developers. They'll tell you what the extension does, but not why you'd want to use it, or how to use it. For that reason, I'm not going to go over the details of the specification format. If you're interested, Mark Kilgard has written an excellent article about it which you can read at OpenGL.org. [[alink='ref']1[/alink]] As new extensions are released, their specifications are listed in the OpenGL Extension Registry, which you can find at the following URL: http://oss.sgi.com/p...ample/registry/ This registry is updated regularly, so it's a great way to keep up with the newest additions to OpenGL. For more detailed descriptions of new extensions, your best bet is the websites of the leading hardware vendors. In particular, NVIDIA [[alink='ref']2[/alink]] and ATI [[alink='ref']3[/alink]] both provide a wealth of information, including white papers, Power Point presentations, and demos. Using Extensions Finally, it's time to learn what you need to do to use an extension. In general, there are only a couple of steps you need to take: determine whether or not the extension is supported obtain the entry point for the any of the extension's functions that you want to use define any enumerants you're going to use. Let's look at each of these steps in greater detail. [bquote]Caution: Before checking for extension availability and obtaining pointers to functions, you MUST have a current rendering context. In addition, the entry points are specific to each rendering context, so if you're using more than one, you'll have to obtain a separate entry point for each.[/bquote] Querying the Name String In order to find out whether or not a specific extension is available, first get the list of all the name strings supported by the OpenGL implementation. To do this, you just need to call glGetString() using GL_EXTENSIONS, like so: char* extensionsList = (char*) glGetString(GL_EXTENSIONS); After this executes, extensionsList points to a null-terminated buffer containing the name strings of all the extensions available to you. These name strings are separated by spaces, including a space after the last name string. [bquote]I'm casting the value returned by glGetString() because the function actually returns an array of unsigned chars. Since most of the string manipulation functions I'll be using require signed chars, I do the cast once now instead of doing it many times later.[/bquote] To find out whether or not the extension you're looking for is supported, you'll need to search this buffer to see if it includes the extension's name string. I'm not going to go into great detail about how to parse the buffer, since there are many ways to do so, and it's something that at this stage in your programming career, you should be able to do without much effort. One thing you need to watch out for, though, is accidentally matching a substring. For example, if you're trying to use the EXT_texture_env extension, and the implementation doesn't support it, but it does support EXT_texture_env_dot3, then calling something like: strstr("GL_EXT_texture_env", extensionsList); is going to give you positive results, making you think that the EXT_texture_env extension is supported, when it's really not. The CheckExtension() function in the demo program included with this article shows one way to avoid this problem. Obtaining the Function's Entry Point Because of the way Microsoft handles its OpenGL implementation, calling a new function provided by an extension requires that you request a function pointer to the entry point from the ICD. This isn't as bad as it sounds. First of all, you need to declare a function pointer. If you've worked with function pointers before, you know that they can be pretty ugly. If not, here's an example: void (APIENTRY * pglCopyTexSubImage3DEXT) (GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei) = NULL; [bquote]Update 4/24/03: For the book, and initially here, I used the function name (i.e. glCopyTexSubImage3DEXT) as the pointer name. A reader pointed out to me that on a number of operating systems (e.g. Linux) this can cause serious problems, so it should be avoided. Thanks, Ian![/bquote] Now that we have the function pointer, we can attempt to assign an entry point to it. This is done using the function wglGetProcAddress(): PROC wglGetProcAddress( LPCSTR lpszProcName ); The only parameter is the name of the function you want to get the address of. The return value is the entry point of the function if it exists, or NULL otherwise. Since the value returned is essentially a generic pointer, you need to cast it to the appropriate function pointer type. Let's look at an example, using the function pointer we declared above: pglCopyTexSubImage3DEXT = (void (APIENTRY *) (GLenum, GLint, GLint, GLint, GLint, GLint, GLint, GLsizei, GLsizei)) wglGetProcAddress("glCopyTexSubImage3DEXT"); And you thought the function pointer declaration was ugly. You can make life easier on yourself by using typedefs. In fact, you can obtain a header called "glext.h" which contains typedefs for most of the extensions out there. This header can usually be obtained from your favorite hardware vendor (for example, NVIDIA includes it in their OpenGL SDK), or from SGI at the following URL: http://oss.sgi.com/p...ple/ABI/glext.h Using this header, the code above becomes: PFNGLCOPYTEXSUBIMAGE3DEXTPROC pglCopyTexSubImage3DEXT = NULL; pglCopyTexSubImage3DEXT = (PFNGLCOPYTEXSUBIMAGE3DEXTPROC) wglGetProcAddress("glCopyTexSubImage3DEXT"); Isn't that a lot better? As long as wglGetProcAddress() doesn't return NULL, you can then freely use the function pointer as if it were a normal OpenGL function. Declaring Enumerants To use new enumerants defined by an extension, all you have to do is define the enumerant to be the appropriate integer value. You can find this value in the extension specification. For example, the specification for the EXT_texture_lod_bias says that GL_TEXTURE_LOD_BIAS_EXT should have a value of 0x8501, so somewhere, probably in a header (or possibly even in gl.h), you'd have the following: #define GL_TEXTURE_LOD_BIAS_EXT 0x8501 Rather than defining all these values yourself, you can use the glext.h header, mentioned in the last section, since it contains all of them for you. Most OpenGL programmers I know use this header, so don't hesitate to use it yourself and save some typing time. Win32 Specifics In addition to the standard extensions that have been covered so far, there are some extensions that are specific to the Windows system. These extensions provide additions that are very specific to the windowing system and the way it interacts with OpenGL, such as additional options related to pixel formats. These extensions are easily identified by their use of "WGL" instead of "GL" in their names. The name strings for these extensions normally aren't included in the buffer returned by glGetString(GL_EXTENSIONS), although a few are. To get all of the Windows-specific extensions, you'll have to use another function, wglGetExtensionsStringARB(). As the ARB suffix indicates, it's an extension itself (ARB_extensions_string), so you'll have to get the address of it yourself using wglGetProcAddress(). Note that for some reason, some ICDs identify this as wglGetExtensionsStringEXT() instead, so if you fail to get a pointer to one, try the other. The format of this function is as follows: const char* wglGetExtensionsStringARB(HDC hdc); [bquote]Caution: Normally, it's good practice to check for an extension by examining the buffer returned by glGetString() before trying to obtain function entry points. However, it's not strictly necessary to do so. If you try to get the entry point for a non-existant function, wglGetProcAddress() will return NULL, and you can simply test for that. The reason I'm mentioning this is because to use wglGetExtensionsStringARB(), that's exactly what you have to do. It appears that with most ICDs, the name string for this extension, WGL_ARB_extensions_string, doesn't appear in the buffer returned by glGetString(). Instead, it is included in the buffer returned by wglGetExtensionsStringARB()! Go figure.[/bquote] Its sole parameter is the handle to your rendering context. The function returns a buffer similar to that returned by glGetString(GL_EXTENSIONS), with the only difference being that it only contains the names of WGL extensions. [bquote]Some WGL extension string names included in the buffer returned by wglGetExtensionsStringARB() may also appear in the buffer returned by glGetString(). This is due to the fact that those extensions existed before the creation of the ARB_extensions_string extension, and so their name strings appear in both places to avoid breaking existing software.[/bquote] Just as there is a glext.h header for core OpenGL extensions, so is there a wglext.h for WGL extensions. You can find it at the following link: http://oss.sgi.com/p...le/ABI/wglext.h Extensions and OpenGL 1.2 and 1.3, and the Future Back at the beginning of this article, I said that OpenGL 1.2 and 1.3 features can be accessed using the extensions mechanism, which I've spent the last several pages explaining. The question, then, is how you go about doing that. The answer, as you may have guessed, is to treat 1.2 and 1.3 features as extensions. When it comes right down to it, that's really what they are, since nearly every feature that has been added to OpenGL originated as an extension. The only real difference between 1.2 and 1.3 features and "normal" extensions is that the former tend to be more widely supported in hardware, because, after all, they are part of the standard. [bquote] Sometimes, an extension that has been added to the OpenGL 1.2 or 1.3 core specification will undergo slight changes, causing the semantics and/or behavior to be somewhat different from what is documented in the extension's specification. You should check the latest OpenGL specification to find out about these changes.[/bquote] The next update to OpenGL will probably be 1.4. It will most likely continue the trend of promoting successful extensions to become part of the standard, and you should be able to continue to use the extension mechanism to access those features. After that, OpenGL 2.0 will hopefully make its appearance, introducing some radical changes to the standard. Once 2.0 is released, new headers and libraries may be released as well, possibly provided by the ARB members. These will make it easier to use new features. What You Get As you can see, using OpenGL 1.2 and 1.3, and extensions in general, isn't a terribly difficult process, but it does take some extra effort. You may be wondering what you can gain by using them, so lets take a closer look at them. The following sections list the features added by OpenGL 1.2 and 1.3, as well as some of the more useful extensions currently available. With each feature, I've included the extension you can use to access it. OpenGL 1.2 3D Textures allow you to do some really cool volumetric effects. Unfortunately, they require a significant amount of memory. To give you an idea, a single 256x256x256 16 bit texture will use 32 MB! For this reason, hardware support for them is relatively limited, and because they are also slower than 2D textures, they may not always provide the best solution. They can, however, be useful if used judiciously. 3D textures correspond to the EXT_texture3D extension. BGRA Pixel Formats make it easier to work with file formats which use blue-green-red color component ordering rather than red-green-blue. Bitmaps and Targas are two examples that fall in this category. BGRA pixel formats correspond to the EXT_bgra extension. Packed Pixel Formats provide support for packed pixels in host memory, allowing you to completely represent a pixel using a single unsigned byte, short, or int. Packet pixel formats correspond to the EXT_packed_pixels extension, with some additions for reversed component order. Normally, since texture mapping happens after lighting, modulating a texture with a lit surface will "wash out" specular highlights. To help avoid this affect, the Separate Specular Color feature has been added. This causes OpenGL to track the specular color separately and apply it after texture mapping. Separate specular color corresponds to the EXT_separate_specular_color extension. Texture Coordinate Edge Clamping addresses a problem with filtering at the edges of textures. When you select GL_CLAMP as your texture wrap mode and use a linear filtering mode, the border will get sampled along with edge texels. Texture coordinate edge clamping causes only the texels which are actually part of the texture to be sampled. This corresponds to the SGIS_texture_edge_clamp extension (which normally shows up as EXT_texture_edge_clamp in the GL_EXTENSIONS string). Normal Rescaling allows you to automatically scale normals by a value you specify, which can be faster than renormalization in some cases, although it requires uniform scaling to be useful. This corresponds to the EXT_rescale_normal extension. Texture LOD Control allows you to specify certain parameters related to the texture level of detail used in mipmapping to avoid popping in certain situations. It can also be used to increase texture transfer performance, since the extension can be used to upload only the mipmap levels visible in the current frame, instead of uploading the entire mipmap hierarchy. This matches the SGIS_texture_lod extension. The Draw Element Range feature adds a new function to be used with vertex arrays. glDrawRangeElements() is similar to glDrawElements(), but it lets you indicate the range of indicies within the arrays that you are using, allowing the hardware to process the data more efficiently. This corresponds to the EXT_draw_range_elements extension. The Imaging Subset is not fully present in all OpenGL implementations, since it's primarily intended for image processing applications. It's actually a collection of several extensions. The following are the ones that may be of interest to game developers. EXT_blend_color allows you to specify a constant color which is used to define blend weighting factors. SGI_color_matrix introduces a new matrix stack to the pixel pipeline, causing the RGBA components of each pixel to be multiplied by a 4x4 matrix. EXT_blend_subtract gives you two ways to use the difference between two blended surfaces (rather than the sum). EXT_blend_minmax lets you keep either the minimum or maximum color components of the source and destination colors. OpenGL 1.3 The Multitexturing extension was promoted to ARB status with OpenGL 1.2.1 (the only real change in that release), and in 1.3, it was made part of the standard. Multitexturing allows you to apply more than one texture to a surface in a single pass, which is useful in many things, such as lightmapping and detail texturing. It was promoted from the ARB_multitexture extension. Texture Compression allows you to either provide OpenGL with precompressed data for your textures, or to have the driver compress the data for you. The advantage in doing so is that you save both texture memory and bandwidth, thereby improving performance. Compressed textures were promoted from the ARB_compressed_textures extension. Cube Map Textures provide a new type of texture consisting of six two-dimensional textures in the shape of a cube. Texture coordinates act like a vector from the center of the cube, indicating which face and which texels to use. Cube mapping is useful in environment mapping and texture-based diffuse lighting. It is also important for pixel-perfect dot3 bumpmapping, as a normalization lookup for interpolated fragment normals. It was promoted from the ARB_texture_cube_map extension. Multisampling allows for automatic antialiasing by sampling all geometry several times for each pixel. When it's supported, and extra buffer is created which contains color, depth, and stencil values. Multisampling is, of course, expensive, and you need to be sure to request a rendering context that supports it. It was promoted from the ARB_multisampling extension. The Texture Add Environment Mode adds a new enumerant which can be passed to glTexEnv(). It causes the texture to be additively combined with the incoming fragment. This was promoted from the ARB_texture_env_add extension. Texture Combine Environment Modes add a lot of new options for the way textures are combined. In addition to the texture color and the incoming fragment, you can also include a constant texture color and the results of the previous texture environment stage as parameters. These parameters can be combined using passthrough, multiplication, addition, biased addition, subtraction, and linear interpolation. You can select combiner operations for the RGB and alpha components separately. You can also scale the final result. As you can see, this addition gives you a great deal of flexibility. Texture combine environment modes were promoted from the ARB_texture_env_combine extension. The Texture Dot3 Environment Mode adds a new enumerant to the texture combine environment modes. The dot3 environment mode allows you to take the dot product of two specified components and place the results in the RGB or RGBA components of the output color. This can be used for per-pixel lighting or bump mapping. The dot3 environment mode was promoted from the ARB_texture_env_dot3 extension. Texture Border Clamp is similar to texture edge clamp, except that it causes texture coordinates that straddle the edge to sample from border texels only, rather than from edge texels. This was promoted from the ARB_texture_border_clamp extension. Transpose Matrices allow you to pass row major matrices to OpenGL, which normally uses column major matrices. This is useful not only because it is how C stores two dimensional arrays, but because it is how Direct3D stores matricies, which saves conversion work when you're writing a rendering engine that uses both APIs. This addition only adds to the interface; it does not change the way OpenGL works internally. Transpose matrices were promoted from the ARB_transpose_matrix extension. Useful Extensions At the time of writing, there are 269 extensions listed in the Extension Registry. Even if I focused on the ones actually being used, I couldn't hope to cover them all, even briefly. Instead, I'll focus on a few that seem to be the most important for use in games. Programmable Vertex and Pixel Shaders It's generally agreed that shaders are the future of graphics, so let's start with them. First of all, the terms "vertex shader" and "pixel shader" are in common usage because of the attention they received with the launch of DirectX 8. However, the OpenGL extensions that you use for them have different names. On NVIDIA cards, vertex shaders are called vertex programs, which are available through the NV_vertex_program extension. Pixel shaders are called register combiners, and are available through the NV_register_combiners and NV_texture_shader extensions. On ATI cards, vertex shaders are still called vertex shaders, and are available through the EXT_vertex_shader extension. Pixel shaders are called fragment shaders, and are available through the ATI_fragment_shader extension. If you're unfamiliar with shaders, then a quick overview is in order. Vertex shaders allow you to customize the geometry transformation pipeline. Pixel shaders work later in the pipeline, and allow you to control how the final pixel color is determined. Together, the two provide incredible functionality. I recommend that you download NVIDIA's Effects Browser to see examples of the things you can do with shaders. Using shaders can be somewhat problematic right now due to the fact that NVIDIA and ATI both handle them very differently. If you want your game to take advantage of shaders, you'll have to write a lot of special case code to use each vendor's method. At the ARB's last several meetings, this has been a major discussion point. There is a great deal of pressure to create a common shader interface. In fact, it is at the core of 3D Labs' OpenGL 2.0 proposal. Hopefully, the 1.4 specification will address this issue, but the ARB seems to be split as to whether a common shader interface should be a necessary component of 1.4. Compiled Vertex Arrays The EXT_compiled_vertex_arrays extension adds two functions which allow you to lock and unlock your vertex arrays. When the vertex arrays are locked, OpenGL assumes that their contents will not be changed. This allows OpenGL to make certain optimizations, such as caching the results of vertex transformation. This is especially useful if your data contains large numbers of shared vertices, or if you are using multipass rendering. When a vertex needs to be transformed, the cache is checked to see if the results of the transformation are already available. If they are, the cached results are used instead of recalculating the transformation. The benefits gained by using CVAs depend on the data set, the video card, and the drivers. Although you generally won't see a decrease in performance when using CVAs, it's quite possible that you won't see much of an increase either. In any case, the fact that they are fairly widely supported makes them worth looking into. WGL Extensions There are a number of extensions available that add to the way Windows interfaces with OpenGL. Here are some of the main ones. ARB_pixel_format augments the standard pixel format functions (i.e. DescribePixelFormat, ChoosePixelFormat, SetPixelFormat, and GetPixelFormat), giving you more control over which pixel format is used. The functions allow you to query individual pixel format attributes, and allow for the addition of new attributes that are not included in the pixel format descriptor structure. Many other WGL extensions are dependent on this extension. ARB_pbuffer adds pixel buffers, which are off-screen (non-visible) rendering buffers. On most cards, these buffers are in video memory, and the operation is hardware accelerated. They are often useful for creating dynamic textures, especially when used with the render texture extension. ARB_render_texture depends on the pbuffer extension. It is specifically designed to provide buffers that can be rendered to and be used as texture data. These buffers are the perfect solution for dynamic texturing. ARB_buffer_region allows you to save portions of the color, depth, or stencil buffers to either system or video memory. This region can then be quickly restored to the OpenGL window. Fences and Ranges NVIDIA has created two extensions, NV_fence and NV_vertex_array_range, that can make video cards based on NVIDIA chipsets use vertex data much more efficiently than they normally would. On NVIDIA hardware, the vertex array range extension is currently the fastest way to transfer data from the application to the GPU. Its speed comes from the fact that it allows the developer to allocate and access memory that normally can only be accessed by the GPU. Although not directly related to the vertex array range extension, the fence extension can help make it even more efficient. When a fence is added to the OpenGL command stream, it can then be queried at any time. Usually, it is queried to determine whether or not it has been completed yet. In addition, you can force the application to wait for the fence to be completed. Fences can be used with vertex array range when there is not enough memory to hold all of your vertex data at once. In this situation, you can fill up available memory, insert a fence, and when the fence has completed, repeat the process. Shadows There are two extensions, SGIX_shadow and SGIX_depth_texture, which work together to allow for hardware-accelerated shadow mapping techniques. The main reason I mention these is that there are currently proposals in place to promote these extensions to ARB status. In addition, NVIDIA is recommending that they be included in the OpenGL 1.4 core specification. Because they may change somewhat if they are promoted, I won't go into detail about how these extensions work. They may prove to be a very attractive alternative to the stencil shadow techniques presently in use. Writing Well-Behaved Programs Using Extensions Something you need to be very aware of when using any extension is that it is highly likely that someone will run your program on a system that does not support that extension. It's your responsibility to make sure that when this happens, your program behaves intelligently, rather than crashing or rendering garbage to the screen. In this section, you'll learn several methods to help ensure that your program gets the best possible results on all systems. The focus is on two areas: how to select which extensions to use, and how to respond when an extension you're using isn't supported. Choosing Extensions The most important thing you can do to insure that your program runs on as many systems as possible is to choose your extensions wisely. The following are some factors you should consider. Do you really need the extension? A quick look at the Extension Registry will reveal that there are a lot of different extensions available, and new ones are being introduced on a regular basis. It's tempting to try many of them out just to see what they do. If you're coding a demo, there's nothing wrong with this, but if you're creating a game that will be distributed to a lot of people, you need to ask yourself whether or not the extension is really needed. Does it make your game run faster? Does it make your game use less video memory? Does it improve the visual quality of your game? Will using it reduce your development time? If the answer to any of these is yes, then the extension is probably a good candidate for inclusion in your product. On the other hand, if it offers no significant benefit, you may want to avoid it altogether. What level of promotion is the extension at? Extensions with higher promotion levels tend to be more widely supported. Any former extension that has been made part of the core 1.2 or 1.3 specification will be supported in compliant implementations, so they are the safest to use (1.2 more so than 1.3 since it's been around for longer). ARB-approved extensions (the ones that use the ARB prefix) aren't required to be supported in compliant implementations, but they are expected to be widely supported, so they're the next safest. Extensions using the EXT prefix are supported by two or more hardware vendors, and are thus moderately safe to use. Finally, vendor specific extensions are the most dangerous. Using them generally requires that you write a lot of special case code. However, they often offer significant benefits, so they should not be ignored. You just have to be especially careful when using them. [bquote]There are times when a vendor-specific extension can be completely replaced by an EXT or ARB extension. In this case, the latter should always be favored.[/bquote] Who is your target audience? If your target audience is hardcore gamers, you can expect that they are going to have newer hardware that will support many, if not all, of the latest extensions, so you can feel safer using them. Moreover, they will probably expect you to use the latest extensions; they want your game to take advantage of all those features they paid so much money for! If, on the other hand, you're targeting casual game players, you'll probably want to use very few extensions, if any. When will your game be done? As mentioned earlier, the graphics industry moves at an extremely quick pace. An extension that is only supported on cutting-edge cards today may enjoy widespread support in two years. Then again, it may become entirely obsolete, either because it is something that consumers don't want, or because it gets replaced by another extension. If your ship date is far enough in the future, you may be able to risk using brand new extensions to enhance your game's graphics. On the other hand, if your game is close to shipping, or if you don't want to risk possible rewrites later on, you're better off sticking with extensions that are already well-supported. What To Do When an Extension Isn't Supported First of all, let's make one thing very clear. Before you use any extension, you need to check to see if it is supported on the user's system. If it's not, you need to do something about it. What that "something" is depends on a number of things, as we'll discuss here, but you really need to have some kind of contingency plan. I've seen OpenGL code that just assumes that the needed extensions will be there. This can lead to blank screens, unexpected rendering effects, and even crashes. Here are some of the possible methods you can use when you find that an extension isn't supported. Don't Use the Extension If the extension is non-critical, or if there is simply no alternate way to accomplish the same thing, you may be able to get away with just not using it at all. For example compiled vertex arrays (EXT_compiled_vertex_array) offer potential speed enhancements when using vertex arrays. The speed gains usually aren't big enough to make or break your program, though, so if they aren't supported, you can use a flag or other means to tell your program to not attempt to use them. Try Similar Extensions Because of the way that extensions evolve, it's possible that the extension you're trying to use is present under an older name (for example, most ARB extensions used to be EXT extensions, and vendor specific extensions before that). Or, if you're using a vendor-specific extension, there may be extensions from other vendors that do close to the same thing. The biggest drawback to this solution is that it requires a lot of special case code. Find an Alternate Way Many extensions were introduced as more efficient ways to do things which could already be done using only core OpenGL features. If you're willing to put in the effort, you can deal with the absence of these extensions by doing things the "old way". For instance, most things that can be done with multitexturing can be done using multipass rendering and alpha blending. Besides the additional code you have to add to handle this, your game will run slower because it has to make multiple passes through the geometry. That's better than not being able to run the game at all, and arguably better than simply dumping multitexturing and sacrificing visual quality. Exit Gracefully In some cases, you may decide that an extension is essential to your program, possibly because there is no other way to do the things you want to do, or because providing a backup plan would require more time and effort than you're willing to invest. When this happens, you should cause your program to exit normally, with a message telling the user what they need to be able to play your game. Note that if you choose to go this route, you should make sure that the hardware requirements listed on the product clearly state what is needed, or your customers will hate you. The Demo I've created a simple demo (see attached resource file) to show you some extensions in action. As you can see from Figure 1, the demo itself is fairly simple, nothing more than a light moving above a textured surface, casting a light on it using a lightmap. The demo isn't interactive at all. I kept it simple because I wanted to be able to focus on the extension mechanism. Figure 1: Basic lightmapping (click to enlarge) The demo uses seven different extensions. Some of them aren't strictly necessary, but I wanted to include enough to get the point across. Table 2 lists all of the extensions in use, and how they are used. Table 2 - Extensions used in the demo Extension Usage ARB_multitexture The floor in this demo is a single quad with two textures applied to it: one for the bricks, and the other for the lightmap, which is updated with the light's position. The textures are combined using modulation. EXT_point_parametetersWhen used, this extension causes point primitives to change size depending on their distance from the eye. You can set attenuation factors to determine how much the size changes, as well as define maximum and minimum sizes, and even specify that the points become partially transparent if they go below a certain threshold. The yellow light in the demo takes advantage of this extension. The effect is subtle, but you should be able to notice it changing size. EXT_swap_control Most OpenGL drivers allow the user to specify whether or not screen redraws should wait for the monitor's vertical refresh, or vertical sync. If this is enabled, your game's framerate will be limited to whatever the monitor refresh rate is set to. This extension allows you to programmatically disable vsync to get to avoid this limitation. EXT_bgra Since the demo uses Targas for textures, using this extension allows it to use their data directly without having to swap the red and blue components before creating the textures. ARB_texture_compression Since the demo only uses two textures, it won't gain much by using texture compression, but since it's easy, so I used it anyway. I allow the drivers to compress the data for me, rather than doing so myself beforehand. EXT_texture_edge_clamp Again, this extension wasn't strictly necessary, but the demo shows how easy it is to use. SGIS_generate_mipmap GLU provides a function, gluBuild2DMipMaps, that allows you to specify just the base level of a mipmap chain and automatically generates the other levels for you. This extension performs essentially the same function, with a couple of exceptions. One, it is a little more efficient. Two, it will cause all of the mipmap levels to be regenerated automatically whenever you change the base level. This can be useful when using dynamic textures. The full source code to the demo is included on the CD, but there are a couple of functions that I want to look at. The first is InitializeExtensions(). This function is called at startup, right after the rendering context is created. It verifies that the extensions used are supported, and gets the function entry points that are needed. bool InitializeExtensions() { if (CheckExtension("GL_ARB_multitexture")) { glMultiTexCoord2f = (PFNGLMULTITEXCOORD2FARBPROC) wglGetProcAddress("glMultiTexCoord2fARB"); glActiveTexture = (PFNGLCLIENTACTIVETEXTUREARBPROC) wglGetProcAddress("glActiveTextureARB"); glClientActiveTexture = (PFNGLACTIVETEXTUREARBPROC) wglGetProcAddress("glClientActiveTextureARB"); } else { MessageBox(g_hwnd, "This program requires multitexturing, which " "is not supported by your hardware", "ERROR", MB_OK); return false; } if (CheckExtension("GL_EXT_point_parameters")) { glPointParameterfvEXT = (PFNGLPOINTPARAMETERFVEXTPROC) wglGetProcAddress("glPointParameterfvEXT"); } if (CheckExtension("WGL_EXT_swap_control")) { wglSwapIntervalEXT = (PFNWGLSWAPINTERVALEXTPROC) wglGetProcAddress("wglSwapIntervalEXT"); } if (!CheckExtension("GL_EXT_bgra")) { MessageBox(g_hwnd, "This program requires the BGRA pixel storage" "format which is not supported by your hardware", "ERROR", MB_OK); return false; } g_useTextureCompression = CheckExtension("GL_ARB_texture_compression"); g_useEdgeClamp = CheckExtension("GL_EXT_texture_edge_clamp"); g_useSGISMipmapGeneration = CheckExtension("GL_SGIS_generate_mipmap"); return true; } As you can see, there are two extensions that the demo requires: multitexturing and BGRA pixel formats. Although I could have provided alternate ways to do both of these things, doing so would have unnecessarily complicated the program. If you're new to OpenGL or have only ever needed the functionality offered in OpenGL 1.1, you may be confused about what the problem is, so let's clarify. To develop for a given version of OpenGL on Windows, you need three things. First, you need a set of libraries (i.e. opengl32.lib and possibly others such as glu32.lib) and headers (i.e. gl.h, and so on) corresponding to the version you'd like to use. These headers and libraries contain the OpenGL functions, constants, and other things you need to be able to compile and link an OpenGL application. Second, the system you intend to run the application on needs to have an OpenGL dynamic link library (OpenGL32.dll), or OpenGL runtime library. The runtime needs to be for either the same or a more recent version of OpenGL as the headers and libraries you're using. Ideally, you will also have a third component, called an Installable Client Driver (IDC). An IDC is provided by the video card drivers to allow for hardware acceleration of OpenGL features, as well as possible enhancements provided by the graphics vendor. So, let's look at these three things and see why you have to jump through a few hoops to use anything newer than OpenGL 1.1:
  6. So, What Exactly Is OpenAL? OpenAL is a (smallish) API that is designed to aid cross platform development, specifically in the area of sound and music. It is designed to be very simple at the expense of (personally speaking) functionality e.g. Panning is not supported (though some people would claim that today's 3D games would rarely, if ever, use features like that), though through clever coding, it is possible to simulate all the missing features Its style is influenced by OpenGL, in regards to function / variable name layout, type definitions etc. so any seasoned OpenGL coder will easily be able to find their way around, but even to novices, it is simple and easy to understand (even the docs are helpful!). Okay, now you know a little about it, let's see how to use it! Where Can I Get My Hands On It? Of course, the first thing you need to do is actually get the OpenAL SDK. You can do this by going to www.openAL.org . It's only about 7MB, so it shouldn't take that long to do. When you've installed it (for the sake of this article, I installed it at C:\oalsdk), have a quick look at the docs and the examples to see how simple it is to use. Initialization What we need to do first is to set up OpenAL so we can actually use its feature set. This is extremely simple to do: // Init openAL alutInit(0, NULL); // Clear Error Code (so we can catch any new errors) alGetError(); That's it, now we are ready to use it! Sources, Buffers and Listeners Before we carry on, I'll explain the main concept behind OpenAL. The API uses 3 main components: sources, buffers and listeners. Here is a quick explanation of each - Sources: A source in OpenAL is exactly what it sounds like, a source of a sound in the world. A source is linked to one or more buffers, and then asked to play them. They have position and orientation in the world, as well as velocity, for doppler effects. Listener: A listener is (simply put) the ears of the world. It is generally set at the location of the player, so that all sounds will be relative to the person playing. This could be the actual camera location, or some other random position, whatever you want... Buffers: Buffers are exactly like any other kind of sound buffers you may have used. Sound is loaded into the buffer, and then the sound in the buffer is played. The thing that differs from normal is that no functions directly call the buffer. Instead, they are linked to a source (or multiple sources) and then the sources themselves play, playing the sound from the buffers in the order that they have been queued Okay, now that we know about the things that make OpenAL what it is, how do you use them? Well, read on... How Do We Create What We Need? Buffers The first things we should create are buffers, so we can actually store the sound we want to play. This is done with one simple call ALuint buffers[NUM_BUFFERS]; // Create the buffers alGenBuffers(NUM_BUFFERS, buffers); if ((error = alGetError()) != AL_NO_ERROR) { printf("alGenBuffers : %d", error); return 0; } All this does is create however many buffers you wish, ready to be filled with sound. The error checking is there for completeness. The buffers do not need to be created at the start of the program; they can be created at any time. However, it's (in my opinion) neater if you create everything you need when you start. Of course, this isn't always possible, so create them in exactly the same way if you need to. Now, let's load in the sound(s) we want to use ALenum format; ALsizei size; ALsizei freq; ALboolean loop; ALvoid* data; alutLoadWAVFile("exciting_sound.wav", &format, &data, &size, &freq, &loop); if ((error = alGetError()) != AL_NO_ERROR) { printf("alutLoadWAVFile exciting_sound.wav : %d", error); // Delete Buffers alDeleteBuffers(NUM_BUFFERS, buffers); return 0; } This loads in the file we have requested and fills in the given the data. Again, the error check is there for completeness. Note though the call to alDeleteBuffers(...). Even though it has failed, we need to clean up after ourselves, unless of course we want memory leaks ;)... There is also a function (alutLoadWAVMemory(...) ) that lets you load the data from memory. Take a peek at the documentation for notes on that one. Now that we have loaded in the sound, we need to fill up one of the buffers with the data. alBufferData(buffers[0],format,data,size,freq); if ((error = alGetError()) != AL_NO_ERROR) { printf("alBufferData buffer 0 : %d", error); // Delete buffers alDeleteBuffers(NUM_BUFFERS, buffers); return 0; } This takes the data we received from alutLoadWAVFile(...) and puts it into the buffer, ready for it to be assigned to a waiting source and played. Again, notice how we delete the buffers if we fail. Now that the buffer is filled with data, we still have the sound data in memory also, taking up valuable space. Get rid of it with: alutUnloadWAV(format,data,size,freq); if ((error = alGetError()) != AL_NO_ERROR) { printf("alutUnloadWAV : %d", error); // Delete buffers alDeleteBuffers(NUM_BUFFERS, buffers); return 0; } This just deletes the no longer needed sound data from memory, as the data we actually need is stored in the buffer! Sources Okay, so we have loaded the sound into the buffer, but as I said earlier, we can't play it until we have attached it to a source. Here's how to do that! ALuint source[NUM_SOURCES]; // Generate the sources alGenSources(NUM_SOURCES, source); if ((error = alGetError()) != AL_NO_ERROR) { printf("alGenSources : %d", error); return 0; } The above piece of code creates all the sources we need. This is exactly the same procedure as alGenBuffers(...), and the same rules apply. It is common to have far more sources than buffers, as a buffer can be attached to more than one source at the same time. Now that the source is created we need to attach a buffer to it, so we can play it... Attaching buffers to sources is again very easy. There are two ways to do it: One way is to attach a single buffer to the source, whereas the other way is to queue multiple buffers to the source, and play them in turn. For the sake of this article, I'm just going to show you the first way. alSourcei(source[0], AL_BUFFER, buffers[0]); if ((error = alGetError()) != AL_NO_ERROR) { printf("alSourcei : %d", error); return 0; } This takes the buffer and attaches it to the source, so that when we play the source this buffer is used. The function alSourcei(...) is actually used in more than one place (in that it is a generic set integer property function for sources). I won't really go into this here, maybe another day. Now that we have set the source and buffer, we need to position the source so that we can actually simulate 3D sound. The properties we will need to set are position, orientation and velocity. These are pretty self explanatory, but I'll mention velocity a little. Velocity is used to simulate a doppler effect. Personally, I can't stand doppler effects so I set the array to 0's, but if you want the effect, the effect will be calculated in respect to the listener's position (That's coming up next!) Okay to set these values, you use one simple function. ALvoid alSourcefv(ALuint source,ALenum pname,ALfloat *values); This function is similar to the alSourcei(...) function except this is a generic function to set an array of floats as a source property, where ALfloat *values will be an array of 3 ALfloats. So, to set our position, velocity and orientaion we call : alSourcefv (source[0], AL_POSITION, sourcePos); alSourcefv (source[0], AL_VELOCITY, sourceVel); alSourcefv (source[0], AL_DIRECTION, sourceOri); These variables default them selves to 0 if you do not set them, although it is more than likely that you will. I haven't included it here, but it is best to check for errors in the standard way, because it's a good idea to catch them as they happen. Okay, that is a source set up, now we need one more thing - the Listener. Listeners As I said before, the listener is the ears of the world, and as such, you can only have one of them. It wouldn't make sense to have more than one, as OpenAL wouldn't know which one to make the sounds relevant to. Because there is only ever one, we do not need to create it, as it comes ready made. All we have to do is set where it is, the direction it is facing and its velocity (again for doppler effects). This is done in exactly the same way as sources with the one function ALvoid alListenerfv(ALenum pname,ALfloat *values); This works in exactly the same way as alSourcefv(...) and alSourcei(...) except it works on the listener. As I said, the listener comes ready made; you do not have to specify it. To set the position, orientation and velocity call: alListenerfv(AL_POSITION,listenerPos); alListenerfv(AL_VELOCITY,listenerVel); alListenerfv(AL_ORIENTATION,listenerOri); Again, it will be a good idea to check for errors. One thing to note here is that in this case, orientation is an array of 6 ALfloat's . The first three define its look at direction, while the second three define the 'up' vector. Once this is done, everything is finally set up (It wasn't that hard was it?). Now we can get down to the thing we are all here for: Playing the sound at last The Hills Are Alive, With The Sound of Music (!) alSourcePlay(source[0]); Yup, that's it. That's all that is required to play your sample. OpenAL will calculate the volume regarding the distance and the orientation. If you specified a velocity for either sources or listeners you'll also get doppler effects. Now that it's playing, we might need it to stop playing... alSourceStop(source[0]); Or reset it to the start of the sample... alSourceRewind(source[0]); Or pause it... alSourcePause(source[0]); To unpause it, just call alSourcePlay(...) on the sample. 've rushed through those, but they really are that simple. If you have a look at the OpenAL Specification and Reference document, on page 32, it will tell you exactly the actions that will happen if you call these functions on a source, depending on its state. It's simple enough, but I'm not going to go into it now. That's all you need to do to get OpenAL up and running, but what about shutting it down? Shutting It All Down Shutting down is even easier than it was to set everything up. We just need to get rid of our sources, buffers and close OpenAL. alDeleteSources(NUM_SOURCES, source); alDeleteBuffers(NUM_BUFFERS, buffers); This will delete all the sources in that array of sources. Make sure you have deleted all of them before you call the next line. alutExit(); And that's it. OpenAL is now shut down and you can carry on with what ever you want to do next As this is the first article I have written, I hope it is of some use to at least one person. There's lots more to OpenAL, but this simple how-to should easily get you set up so you can look at the more complicated features (if there are any) of the API. Lee Winder Leewinder@hotmail.com
  7. Introduction WHOA! What do you know, I'm finally doing a tutorial on an actual Programming Topic. I think the temperature in Hell must have dropped below 32. The source code examples in this tutorial will be both in Pascal, and C (wherever I can translate it). Since the techniques are cross platform, I won't be showing code on how to Blit the images to the screen. All of the tiles I use are 40x40, and all of the calculations are based on it. Depending on the size of bitmap you use, you may have to scale up or down. What the Hell are Isometric and Hexagonal Maps for? Isometric Maps are maps that use rhombuses instead of squares or rectangles. (A rhombus is a four sided figure, with all sides the same length, but not necessarily 90 degrees at the corners. Yes, a Square is technically a rhombus). Isometric maps give sort of an illusion of being 3d, but without actually being so. Sid Meier's Civilization II uses Isometric Maps. Here is an Isometric Tile: (This tile is actually 40x40, but the rhombus only takes up the bottom 40x21) Hexagonal Maps are maps that use Hexagons (6 sided figures) instead of squares or rectangles. Hexagonal maps are used mostly for overhead view strategy games (The use of these dates back to Avalon Hill, and other strategy game companies). Here is a Hexagonal Tile: Forcing Isometric and Hexagonal Maps onto a Rectangular Grid Okay, you can make Chessboard-like maps all day, you just have to use a 2d array. Spiffy. But Isometric and Hexagonal Maps don't work that way. Every other line is offset. We can still put Iso and Hex maps into a 2d Array, but the WAY in which we map them is different. Here's an IsoMap: Here's a HexMap: As demonstrated in the above pictures, for odd Y values, you shift the line over by half of a tile (20 pixels, in my case). (The White Spaces are border tiles not on the map. Usually, you would fill these with black.) Plotting the Iso/Hex Tiles on the Map Since both Iso and Hex tiles are contained in overlapping rectangles, you MUST USE BITMASKS! My Iso Tiles, and the Iso BitMask: My Hex Tiles, and the Hex BitMask: {The Brief Review of BitMasking: You blit the bitmask using AND, then Blit the Tile using OR.} Pixel Coordinates of Iso/Hex Tiles When calculating X,Y coordinates for Rectangular tiles, you use the following calculations: PlotX=MapX*Width PlotY=MapY*Height For Iso/Hex maps, it's a little trickier, since the bounding rectangles overlap. Iso Maps: {(MapY AND 1) tells us if MapY is odd or even, and shifts the tile to the right if it is odd} PlotX=MapX*Width+(MapY AND 1)*(Width/2) PlotY=MapY*HeightOverLapping-YOffset Important: Width should always be an even number, or you wind up with a black zigzag line between rows of tiles {This assumes you have shaped your rhombus like mine, with one pixel} {at the left and right corners, and two at the top and bottom.} HeightOverLapping=(Height of Rhombus)/2+1 {to make the first row flush with the top of the map} Yoffset=Height-(Height of Rhombus) HexMaps: PlotX=MapX*Width+(MapY AND 1)*(Width/2) PlotY=MapY*HeightOverLapping HeightOverLapping=(Height of Hexagon)*0.75 {Assuming your hexagon looks like mine} Moving Around in Iso/Hex Maps In Rectangular maps, movement from square to square is easy. Just add/subtract 1 to X and/or Y, and you have made the move. Iso and Hex maps make THAT more difficult, as well. Due to the fact that every other line is offset, there are different calculations, depending if whatever is moving is on an Even Row, or an Odd Row. Isometric Directions: For coding purposes, we will give names to these directions: {Pascal} Const IsoEast=0; IsoSouthEast=1; IsoSouth=2; IsoSouthWest=3; IsoWest=4; IsoNorthWest=5; IsoNorth=6; IsoNorthEast=7; {C} #define ISOEAST 0 #define ISOSOUTHEAST 1 #define ISOSOUTH 2 #define ISOSOUTHWEST 3 #define ISOWEST 4 #define ISONORTHWEST 5 #define ISONORTH 6 #define ISONORTHEAST 7 Hexagonal Directions: The names for these directions: {Pascal} Const HexEast=0; HexSouthEast=1; HexSouthWest=2; HexWest=3; HexNorthWest=4; HexNorthEast=5; {C} #define HEXEAST 0 #define HEXSOUTHEAST 1 #define HEXSOUTHWEST 2 #define HEXWEST 3 #define HEXNORTHWEST 4 #define HEXNORTHEAST 5 Here is a table of DX(Change In X), and DY(Change in Y) for each direction on the Iso and Hex maps, divided into two lists, one for even Y values, and one for odd Y values. As you can see, DeltaY is the same, no matter what row you are on. Only DeltaX changes. Also, For the Cardinal Directions (North, East, South, and West), DeltaX is the same no matter what row you are on. Its only diagonal movement that is tricky. So now, let's build a few functions: IsoDeltaX, IsoDeltaY, HexDeltaX, and HexDeltaY. {Pascal} {To find out how we should modify X in order to move a given direction.} {Dir is direction of intended movement, and OddRow is whether or not the} {current Y position is odd or even. you can feed the expression ((Y and 1)=1} Function IsoDeltaX(Dir:byte;OddRow:boolean):integer; Var Temp:integer; Begin Temp:=0; {The default change in X is 0. We'll only modify it if we have to} Case Dir of IsoEast: Temp:=1; IsoWest: Temp:=-1; IsoNorth: Temp:=Temp-2; IsoSouth: Temp:=Temp+2; IsoSouthEast, IsoNorthEast: If OddRow then Temp:=1;{If Not OddRow, then leave as 0} IsoSouthWest, IsoNorthWest: If Not OddRow then Temp:=-1; {If OddRow, the leave as 0} End; IsoDeltaX:=Temp;{Return the Value} End; {To find out how we should modify Y in order to move in a given direction.} {Dir is the direction of intended movement} Function IsoDeltaY(Dir:byte):integer; Var Temp:integer; Begin Temp:=0;{Default Value of 0. We will change it only if we have to} Case Dir of IsoNorth: Temp:=-2; IsoSouth: Temp:=2; IsoNorthWest, IsoNorthEast: Temp:=-1; IsoSouthWest,IsoSouthEast: Temp:=1; End; IsoDeltaY:=Temp;{Return the value} End; Function HexDeltaX(Dir:byte;OddRow:boolean):integer; Var Temp:integer; Begin Temp:=0; Case Dir of HexEast: Temp:=1; HexWest: Temp:=-1; HexSouthEast, HexNorthEast: If OddRow then Temp:=1; HexSouthWest, HexNorthWest: If Not OddRow then Temp:=-1; End; HexDeltaX:=Temp; End; Function HexDeltaY(Dir:byte):integer; Var Temp:integer; Begin Temp:=0; Case Dir of HexNorthWest, HexNorthEast: Temp:=-1; HexSouthWest,HexSouthEast: Temp:=1; End; HexDeltaY:=Temp; End; {C} int isodeltax(unsigned char dir, BOOL oddrow) { int temp=0; switch(dir) { case ISOEAST: temp=1; break; case ISOWEST: temp=-1;break; case ISOSOUTHEAST: case ISONORTHEAST: if (oddrow==TRUE) temp=1; break; case ISOSOUTHWEST: case ISONORTHWEST: if (oddrow==FALSE) temp=-1;break; } return(temp); } int isodeltay(unsigned char dir) { int temp=0; switch(dir) { case ISONORTH: temp=-2;break; case ISOSOUTH: temp=2;break; case ISOSOUTHEAST: case ISOSOUTHWEST: temp=1;break; case ISONORTHEAST: case ISONORTHWEST: temp=-1;break; } return(temp); } int hexdeltax(unsigned char dir, BOOL oddrow) { int temp=0; switch(dir) { case HEXEAST: temp=1; break; case HEXWEST: temp=-1;break; case HEXSOUTHEAST: case HEXNORTHEAST: if (oddrow==TRUE) temp=1; break; case HEXSOUTHWEST: case HEXNORTHWEST: if (oddrow==FALSE) temp=-1;break; } return(temp); } int hexdeltay(unsigned char dir) { int temp=0; switch(dir) { case HEXSOUTHEAST: case HEXSOUTHWEST: temp=1;break; case HEXNORTHEAST: case HEXNORTHWEST: temp=-1;break; } return(temp); } Facing and Turning In some games, like strategy games, as well as others, the direction that something on a tile is facing is just as important as what tile they are on. (for things like arc fire, etc.) Keeping track of facing is no big deal. It's just a byte (char) that keeps track of the unit's direction (0 to 7 for Iso, 0 to 5 for Hex) For Turning the unit, we may want to have a function or two, as well as some turning constants. In Iso, we turn in increments of 45 degrees, in Hex, we turn in increments of 60. {Pascal} Const {Iso Turning Constants} IsoTurnNone=0; IsoTurnRight45=1; IsoTurnRight90=2; IsoTurnRight135=3; IsoTurnAround=4; IsoTurnLeft135=5; IsoTurnLeft90=6; IsoTurnLeft45=7; {Hex Turning Constants} HexTurnNone=0; HexTurnRight60=1; HexTurnRight120=2; HexTurnAround=3; HexTurnLeft120=4; HexTurnLeft60=5; Function IsoTurn(Dir,Turn:byte):byte; Begin IsoTurn:=(Dir+Turn) AND 7; End; Function HexTurn(Dir,Turn:byte):byte; Begin HexTurn:=(Dir+Turn) MOD 6; End; {C} /*Iso Turn Constants*/ #define ISOTURNNONE 0 #define ISOTURNRIGHT45 1 #define ISOTURNRIGHT90 2 #define ISOTURNRIGHT135 3 #define ISOTURNAROUND 4 #define ISOTURNLEFT135 5 #define ISOTURNLEFT90 6 #define ISOTURNLEFT45 7 /*Hex Turn Constants*/ #define HEXTURNNONE 0 #define HEXTURNRIGHT60 1 #define HEXTURNRIGHT120 2 #define HEXTURNAROUND 3 #define HEXTURNLEFT120 4 #define HEXTURNLEFT60 5 unsigned char isoturn(unsigned char dir, unsigned char turn) { return((dir+turn) & 7); } unsigned char hexturn(unsigned char dir, unsigned char turn) { return((dir+turn) % 6); } Mouse Matters Another major difficulty of Iso/Hex mapping is the mouse cursor. This was one of my difficulties for a long time. Then, I took a look at one of the GIFs that shipped with Civilization II. It had a little picture, kind of like this: AHA! I said. Then I understood. We don't have to do bizarre calculations in order to figure out what tile we're on! We just divide the screen (or map) into little rectangles like the one above, figure out where in a given rectangle our mouse is, and find the color on the picture above that corresponds! This will allow us to figure out which tile our mouse is hovering over. (After stumbling on to this epiphany, I promptly smacked myself in the forehead and said "DUH!") I call the above picture the Isometric MouseMap. Here's how to use it. (For Hex Maps, use the same algorithm, but with the following MouseMap: ) First Step: Find out what region of the map the mouse is in. RegionX=int(MouseX/MouseMapWidth) RegionY=int(MouseY/MouseMapHeight)*2 {The multiplying by two is very important} Second Step: Find out WHERE in the mousemap our mouse is, by finding MouseMapX and MouseMapY. MouseMapX=MouseX MOD MouseMapWidth MouseMapY=MouseY MOD MouseMapHeight Third Step: Determine the color in the MouseMap at (MouseMapX,MouseMapY). Fourth Step: Find RegionDX and RegionDY in the following table Fifth Step: Use RegionX,RegionY, RegionDX, and RegionDY to find out TileX and TileY TileX=RegionX+RegionDX TileY=RegionY+RegionDY Next Time: I will discuss putting Objects onto our Iso/Hex tiles, with a minimum of muss and fuss, and proper screen updating, so you don't have to draw the whole map every time.
  8. Global illumination (GI) is a term used in computer graphics to refer to all lighting phenomena caused by interaction between surfaces (light rebounding off them, refracting, or getting blocked), for example: color bleeding, caustics, and shadows. Many times the term GI is used to refer only to color bleeding and realistic ambient lighting. Direct illumination - light that comes directly from a light source - is easily computed in real-time with today?s hardware, but we can?t say the same about GI because we need to gather information about nearby surfaces for every surface in the scene and the complexity of this quickly gets out of control. However, there are some approximations to GI that are easier to manage. When light travels through a scene, rebounding off surfaces, there are some places that have a smaller chance of getting hit with light: corners, tight gaps between objects, creases, etc. This results in those areas being darker than their surroundings. This effect is called ambient occlusion (AO), and the usual method to simulate this darkening of certain areas of the scene involves testing, for each surface, how much it is "occluded" or "blocked from light" by other surfaces. Calculating this is faster than trying to account for all global lighting effects, but most existing AO algorithms still can't run in real-time. Real-time AO was out of the reach until Screen Space Ambient Occlusion (SSAO) appeared. SSAO is a method to approximate ambient occlusion in screen space. It was first used in games by Crytek, in their "Crysis" franchise and has been used in many other games since. In this article I will explain a simple and concise SSAO method that achieves better quality than the traditional implementation. [center][url="http://www.flickr.com/photos/gamedevnet/4639143267/"][img]http://farm5.static.flickr.com/4006/4639143267_9a3ba682db.jpg[/img][/url] [i]The SSAO in Crysis[/i] [/center] Prerequisites The original implementation by Crytek had a depth buffer as input and worked roughly like this: for each pixel in the depth buffer, sample a few points in 3D around it, project them back to screen space and compare the depth of the sample and the depth at that position in the depth buffer to determine if the sample is in front (no occlusion) or behind a surface (it hits an occluding object). An occlusion buffer is generated by averaging the distances of occluded samples to the depth buffer. However this approach has some problems (such as self occlusion, haloing) that I will illustrate later. The algorithm I describe here does all calculations in 2D, no projection is needed. It uses per-pixel position and normal buffers, so if you?re using a deferred renderer you have half of the work done already. If you?re not, you can try to reconstruct position from depth or you can store per-pixel position directly in a floating point buffer. I recommend the later if this is your first time implementing SSAO as I will not discuss position reconstruction from depth here. Either way, for the rest of the article I?ll assume you have both buffers available. Positions and normals need to be in view space. What we are going to do in this article is exactly this: [b]take the position and normal buffer, and generate a one-component-per-pixel occlusion buffer[/b]. How to use this occlusion information is up to you; the usual way is to subtract it from the ambient lighting in your scene, but you can also use it in more convoluted or strange ways for NPR (non-photorealistic) rendering if you wish. Algorithm Given any pixel in the scene, it is possible to calculate its ambient occlusion by treating all neighboring pixels as small spheres, and adding together their contributions. To simplify things, we will work with points instead of spheres: [b]occluders will be just points with no orientation and the occludee (the pixel which receives occlusion) will be a pair[/b]. Then, the occlusion contribution of each occluder depends on two factors: [list] [*]Distance "d" to the occludee. [*]Angle between the occludee?s normal "N" and the vector between occluder and occludee "V". [/list] With these two factors in mind, a simple formula to calculate occlusion is: [b]Occlusion = max( 0.0, dot( N, V) ) * ( 1.0 / ( 1.0 + d ) )[/b] The first term, max( 0.0, dot( N,V ) ), works based on the intuitive idea that points directly above the occludee contribute more than points near it but not quite right on top. The purpose of the second term ( 1.0 / ( 1.0 + d ) ) is to attenuate the effect linearly with distance. You could choose to use quadratic attenuation or any other function, it?s just a matter of taste. [center][url="http://www.flickr.com/photos/gamedevnet/4639752338/"][img]http://farm5.static.flickr.com/4026/4639752338_7a574740e9.jpg[/img][/url] [/center] The algorithm is very easy: sample a few neighbors around the current pixel and accumulate their occlusion contribution using the formula above. To gather occlusion, I use 4 samples (,,,) rotated at 45? and 90?, and reflected using a random normal texture. Some tricks can be applied to accelerate the calculations: you can use half-sized position and normal buffers, or you can also apply a bilateral blur to the resulting SSAO buffer to hide sampling artifacts if you wish. Note that these two techniques can be applied to any SSAO algorithm. This is the HLSL pixel shader code for the effect that has to be applied to a full screen quad: [code]sampler g_buffer_norm; sampler g_buffer_pos; sampler g_random; float random_size; float g_sample_rad; float g_intensity; float g_scale; float g_bias; struct PS_INPUT { float2 uv : TEXCOORD0; }; struct PS_OUTPUT { float4 color : COLOR0; }; float3 getPosition(in float2 uv) { return tex2D(g_buffer_pos,uv).xyz; } float3 getNormal(in float2 uv) { return normalize(tex2D(g_buffer_norm, uv).xyz * 2.0f - 1.0f); } float2 getRandom(in float2 uv) { return normalize(tex2D(g_random, g_screen_size * uv / random_size).xy * 2.0f - 1.0f); } float doAmbientOcclusion(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm) { float3 diff = getPosition(tcoord + uv) - p; const float3 v = normalize(diff); const float d = length(diff)*g_scale; return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d))*g_intensity; } PS_OUTPUT main(PS_INPUT i) { PS_OUTPUT o = (PS_OUTPUT)0; o.color.rgb = 1.0f; const float2 vec[4] = {float2(1,0),float2(-1,0), float2(0,1),float2(0,-1)}; float3 p = getPosition(i.uv); float3 n = getNormal(i.uv); float2 rand = getRandom(i.uv); float ao = 0.0f; float rad = g_sample_rad/p.z; //**SSAO Calculation**// int iterations = 4; for (int j = 0; j < iterations; ++j) { float2 coord1 = reflect(vec[j],rand)*rad; float2 coord2 = float2(coord1.x*0.707 - coord1.y*0.707, coord1.x*0.707 + coord1.y*0.707); ao += doAmbientOcclusion(i.uv,coord1*0.25, p, n); ao += doAmbientOcclusion(i.uv,coord2*0.5, p, n); ao += doAmbientOcclusion(i.uv,coord1*0.75, p, n); ao += doAmbientOcclusion(i.uv,coord2, p, n); } ao/=(float)iterations*4.0; //**END**// //Do stuff here with your occlusion value ??ao??: modulate ambient lighting, write it to a buffer for later //use, etc. return o; }[/code] The concept is very similar to the image space approach presented in "Hardware Accelerated Ambient Occlusion Techniques on GPUs" [1] the main differences being the sampling pattern and the AO function. It can also be understood as an image-space version of "Dynamic Ambient Occlusion and Indirect Lighting" [2] Some details worth mentioning about the code: [list] [*]The radius is divided by p.z, to scale it depending on the distance to the camera. If you bypass this division, all pixels on screen will use the same sampling radius, and the output will lose the perspective illusion. [*]During the for loop, coord1 are the original sampling coordinates, at 90?. coord2 are the same coordinates, rotated 45?. [*]The random texture contains randomized normal vectors, so it is your average normal map. This is the random normal texture I use: [url="http://www.flickr.com/photos/gamedevnet/4639143323/"][img]http://farm5.static.flickr.com/4003/4639143323_c6bb4a75e3_t.jpg[/img][/url] It is tiled across the screen and then sampled for each pixel, using these texture coordinates: [size=3][font=Courier New]g_screen_size * uv / random_size [/font][/size] Where "g_screen_size" contains the width and height of the screen in pixels and "random_size" is the size of the random texture (the one I use is 64x64). The normal you obtain by sampling the texture is then used to reflect the sampling vector inside the for loop, thus getting a different sampling pattern for each pixel on the screen. (check out "interleaved sampling" in the references section) [/list] At the end, the shader reduces to iterating through some occluders, invoking our AO function for each of them and accumulating the results. There are four artist variables in it: [list] [*]g_scale: scales distance between occluders and occludee. [*]g_bias: controls the width of the occlusion cone considered by the occludee. [*]g_sample_rad: the sampling radius. [*]g_intensity: the ao intensity. [/list] Once you tweak the values a bit and see how the AO reacts to them, it becomes very intuitive to achieve the effect you want. Results [center] [url="http://www.flickr.com/photos/gamedevnet/4639143365/"][img]http://farm5.static.flickr.com/4045/4639143365_eb4136e969.jpg[/img][/url] [i]a) raw output, 1 pass 16 samples b] raw output, 1 pass 8 samples c) directional light only d) directional light - ao, 2 passes 16 samples each.[/i][/center] [left]As you can see, the code is short and simple, and the results show no self occlusion and very little to no haloing. These are the two main problems of all the SSAO algorithms that use only the depth buffer as input, you can see them in these images:[/left] [center][url="http://www.flickr.com/photos/gamedevnet/4639143389/"]?[img]http://farm5.static.flickr.com/4054/4639143389_42b13c5ef6.jpg[/img][/url] [url="http://www.flickr.com/photos/gamedevnet/4639143415/"][img]http://farm5.static.flickr.com/4030/4639143415_444cde1085.jpg[/img][/url][/center] The self-occlusion appears because the traditional algorithm samples inside a sphere around each pixel, so in non-occluded planar surfaces at least half of the samples are marked as 'occluded'. This yields a grayish color to the overall occlusion. Haloing causes soft white edges around objects, because in these areas self-occlusion does not take place. So getting rid of self-occlusion actually helps a lot hiding the halos. The resulting occlusion from this method is also surprisingly consistent when moving the camera around. If you go for quality instead of speed, it is possible to use two or more passes of the algorithm (duplicate the for loop in the code) with different radiuses, one for capturing more global AO and other to bring out small crevices. With lighting and/or textures applied, the sampling artifacts are less apparent and because of this, usually you should not need an extra blurring pass. Taking it further I have described a down-to-earth, simple SSAO implementation that suits games very well. However, it is easy to extend it to take into account hidden surfaces that face away from the camera, obtaining better quality. Usually this would require three buffers: two position/depth buffers (front/back faces) and one normal buffer. But you can do it with only two buffers: store depth of front faces and back faces in red and green channels of a buffer respectively, then reconstruct position from each one. This way you have one buffer for positions and a second buffer for normal. These are the results when taking 16 samples for each position buffer: [center][url="http://www.flickr.com/photos/gamedevnet/4639752478/"][img]http://farm5.static.flickr.com/4004/4639752478_0645735a87.jpg[/img][/url] [i]left: front faces occlusion, right: back faces occlusion[/i] [/center] To implement it just and extra calls to "doAmbientOcclusion()" inside the sampling loop that sample the back faces position buffer when searching for occluders. As you can see, the back faces contribute very little and they require doubling the number of samples, almost doubling the render time. You could of course take fewer samples for back faces, but it is still not very practical. This is the extra code that needs to be added: inside the for loop, add these calls: [code]ao += doAmbientOcclusionBack(i.uv,coord1*(0.25+0.125), p, n); ao += doAmbientOcclusionBack(i.uv,coord2*(0.5+0.125), p, n); ao += doAmbientOcclusionBack(i.uv,coord1*(0.75+0.125), p, n); ao += doAmbientOcclusionBack(i.uv,coord2*1.125, p, n);[/code] Add these two functions to the shader: [code]float3 getPositionBack(in float2 uv) { return tex2D(g_buffer_posb,uv).xyz; } float doAmbientOcclusionBack(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm) { float3 diff = getPositionBack(tcoord + uv) - p; const float3 v = normalize(diff); const float d = length(diff)*g_scale; return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d)); }[/code] Add a sampler named "g_buffer_posb" containing the position of back faces. (draw the scene with front face culling enabled to generate it) Another small change that can be made, this time to improve speed instead of quality, is adding a simple LOD (level of detail) system to our shader. Change the fixed amount of iterations with this: [size=3][font=Courier New]int iterations = lerp(6.0,2.0,p.z/g_far_clip); [/font][/size] The variable "g_far_clip" is the distance of the far clipping plane, which must be passed to the shader. Now the amount of iterations applied to each pixel depends on distance to the camera. Thus, distant pixels perform a coarser sampling, improving performance with no noticeable quality loss. I?ve not used this in the performance measurements (below), however. Conclusion and Performance Measurements As I said at the beginning of the article, this method is very well suited for games using deferred lighting pipelines because it requires two buffers that are usually already available. It is straightforward to implement, and the quality is very good. It solves the self-occlusion issue and reduces haloing, but apart from that it has the same limitations as other screen-space ambient occlusion techniques: Disadvantages:[list] [*]Does not take into account hidden geometry (especially geometry outside the frustum). [*]The performance is very dependent on sampling radius and distance to the camera, since objects near the front plane of the frustum will use bigger radiuses than those far away. [*]The output is noisy. [/list] Speed wise, it is roughly equal to a 4x4 Gaussian blur for a 16 sample implementation, since it samples only 1 texture per sample and the AO function is really simple, but in practice it is a bit slower. Here?s a table showing the measured speed in a scene with the Hebe model at 900x650 with no blur applied on a Nvidia 8800GT: SettingsFPSSSAO time (ms)High (32 samples front/back) 150 3.3Medium (16 samples front)2900.27Low (8 samples front)3100.08 [left]In these last screenshots you can see how this algorithm looks when applied to different models. At highest quality (32 samples front and back faces, very big radius, 3x3 bilateral blur): [/left] [center][url="http://www.flickr.com/photos/gamedevnet/4639752508/"][img]http://farm4.static.flickr.com/3043/4639752508_642aafb156.jpg[/img][/url][/center] [left]At lowest quality (8 samples front faces only, no blur, small radius):[/left] [center][url="http://www.flickr.com/photos/gamedevnet/4639143469/"][img]http://farm5.static.flickr.com/4060/4639143469_479dd85cb2.jpg[/img][/url][/center] It is also useful to consider how this technique compares to ray-traced AO. The purpose of this comparison is to see if the method would converge to real AO when using enough samples. [center][url="http://www.flickr.com/photos/gamedevnet/4639143501/"][img]http://farm4.static.flickr.com/3386/4639143501_af7880788e.jpg[/img][/url] [i]Left: the SSAO presented here, 48 samples per pixel (32 for front faces and 16 for back faces), no blur. Right: Ray traced AO in Mental Ray. 32 samples, spread = 2.0, maxdistance = 1.0; falloff = 1.0.[/i] [/center] One last word of advice: don?t expect to plug the shader into your pipeline and get a realistic look automatically. Despite this implementation having a good performance/quality ratio, SSAO is a time consuming effect and you should tweak it carefully to suit your needs and obtain the best performance possible. Add or remove samples, add a bilateral blur on top, change intensity, etc. You should also consider if SSAO is the way to go for you. Unless you have lots of dynamic objects in your scene, you should not need SSAO at all; maybe light maps are enough for your purpose as they can provide better quality for static scenes. I hope you will benefit in some way from this method. All code included in this article is made available under the [url="http://www.opensource.org/licenses/mit-license.php"]MIT license[/url] [size=3][b]References[/b][/size] [1] Hardware Accelerated Ambient Occlusion Techniques on GPUs (Perumaal Shanmugam) [2] Dynamic Ambient Occlusion and Indirect Lighting (Michael Bunnell) [3] Image-Based Proxy Accumulation for Real-Time Soft Global Illumination (Peter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder) [4] Interleaved Sampling (Alexander Keller, Wolfgang Heidrich) [center][url="http://www.flickr.com/photos/gamedevnet/4639143627/"][img]http://farm5.static.flickr.com/4060/4639143627_b4bba7bbee.jpg[/img][/url] [i]Crytek?s Sponza rendered at 1024x768, 175 fps with a directional light.[/i][/center] [center][url="http://www.flickr.com/photos/gamedevnet/4639752926/"][img]http://farm5.static.flickr.com/4047/4639752926_1741d420fe.jpg[/img][/url] [i]The same scene rendered at 1024x768, 110 fps using SSAO medium settings: 16 samples, front faces, no blur. Ambient lighting has been multiplied by (1.0-AO). [/i][/center] The Sponza model was downloaded from [url="http://www.crytek.com/downloads/technology/"]Crytek's website.[/url]
  9. Happy Father's Day to both of my dads! See you both in a few weeks.
  10. Happy half-birthday to my best friend and love of my life!
  11. On the train, I overheard a group of people talking about the NSA getting phone records from Verizon. One woman stated that once her contract is up, she's canceling her service with Verizon. Another said that we shouldn't worry about it because NPR said it isn't a big deal. /facepalm. This is why we can't have nice things.
  12. I think I just found my birth mother...
  13. Dave Foley tonight :D
  14. I just received a robo-spam (i.e. our sites cover similar topics) from a speech pathology site because GameDev.net has a thread about LISP...
  15. Talking to marketing people is making me feel stabby
  16. Can't sleep, clown will eat me
  17. Thanks for all of the birthday wishes! I had one of the weirdest birthdays ever, but it dampened the blow of turning 40, and I had a good time. Plus, I got my suitcase back :)
  18. Useless coworkers are useless
  19. Just tuned in to the debates to see the "analysts" demonizing Ron Paul for his views on Iran. What a trio of loathsome trolls.
  20. Me: Are you mad at me? Tyler: That depends. What did you actually do? Me: I just summoned Cthulhu. Tyler: ...
  21. Nate: "How do you spell everything?" Evan: "That's going to take a while, Nate"
  22. Should I be concerned by a resume from a guy with an interest in "human-computer interaction" and experience working with "vibro-tactile interfaces"?
  23. I'm watching Conan get his ordination as a minister from the same website I got mine. Awesome! :)
  • Advertisement