<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
	<title>Graphics Programming and Theory - Articles</title>
	<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/</link>
	<pubDate>Sun, 26 May 2013 03:34:56 +0000</pubDate>
	<ttl>86400</ttl>
	<description>Resources that relate to computer graphics, both the creation of and the discussion of</description>
	<item>
		<title>Dynamic 2D Soft Shadows</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/dynamic-2d-soft-shadows-r3065</link>
		<description><![CDATA[The aim of this document is to describe an accurate method of generating soft shadows in a 2D environment. The shadows are created on the fly via geometry as opposed to traditional 2D shadow methods, which typically use shadow sprites or other image based methods. This method takes advantage of several features available in the core OpenGL API, but is by no means restricted to this platform. Direct3D would also be a suitable rendering API, and the concepts and reasoning behind the various rendering stages should hopefully be clear enough that a reader can make the conversion without too much hassle.<br /><br /><p class="message note">

<strong>Note:</strong>&nbsp;&nbsp;<br />This article was originally published to GameDev.net back in 2004. It was revised by the original author in 2008 and published in the book <a href='http://www.amazon.com/Advanced-Game-Programming-GameDev-net-Collection/dp/1598638068/ref=pd_sim_b_2' class='bbc_url' title='External link' rel='nofollow external'>Advanced Game Programming: A GameDev.net Collection</a>, which is one of 4 books collecting both popular GameDev.net articls and new original content in print format.<br />
				
</p><br /><h1>Overview</h1><br />We will start by defining a few terms that we will use frequently, and a brief explanation of the phenomena that we are attempting to reproduce on our digital canvas.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15117-0-17541400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15117" title="01Overview.gif - Size: 34.11K, Downloads: 30"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-33532200-1366374373_thumb.gif" id='ipb-attach-img-15117-0-17541400-1369539296' style='width:480;height:368' class='attach' width="480" height="368" alt="Attached Image: 01Overview.gif" /></a><br />Image 1: Overview of terms<br /></p><br /><h2>Light source</h2><br />An obvious place to start – in this implementation we will discuss a point light source, although extending the method to include directional lights as well would be easily done, as is adding ambient lighting into the system. We use a point light source with a user-defined radius to generate the soft shadows accurately.<br /><br /><h2>Shadow caster</h2><br />A shadow caster is any object that blocks the light emitted from the source. In this article we present implementation details for using convex hulls as shadow casters. Convex hulls have several useful properties, and provide a flexible primitive from which to construct more complex objects. Details of the hulls are discussed in just a bit.<br /><br /><h2>Light range</h2><br />In reality light intensity over a distance is subject to the inverse square relationship, and so can never really reach zero. In games however linear light fall off often looks as good or better depending on the circumstances. The image above a linear fall off in the intensity is used, dropping to zero at the edge of the light range.<br /><br /><h2>Umbra</h2><br />The umbra region of a shadow is the area completely hidden from the light source, and as such is a single colour (the image above shows the umbra region in black since there is no other light source to illuminate this region).<br /><br /><h2>Penumbra</h2><br />The penumbra region of a shadow is what creates the soft edges. This is cast in any area that is partially hidden from the light but neither in full view or totally hidden. The size and shape of the penumbra region is related to the lights position and physical size (not its range).<br /><br /><h1>Core Classes</h1><br />First we'll have a look at a couple of classes that are at the core of the system – the Light and ConvexHull classes.<br /><br />Light: The light class is fairly self-explanatory, holding all the information needed to represent a light source in our world.<br /><br />Contains: <br /><ul class='bbc'><li>Position and depth. Fairly obvious, these are the location in the world. Although the system is 2d, we still use a depth for correctly defining which objects to draw in front of which others. Since we'll be using 3D hardware to get nice fast rendering we'll take advantage of the depth buffer for this.</li><li>Physical size and range. Both stored as a simple radial distance, these control how the light influences its surroundings.</li><li>Colour and intensity. Lights have a colour value stored in the standard RGB form, and an intensity value which is the intensity at the centre of the light.</li></ul>ConvexHull: The convex hull is our primitive shape from which we will construct our world. By using these primitives we are able to construct more complex geometry.<br /><br />Contains:<br /><ul class='bbc'><li>List of points. A simple list is maintained of all the points that make up the edges of the hull. This is calculated from a collection of points and the gift-wrapping algorithm is used to discard unneeded points. The gift-wrapping method is useful since the output geometry typically has a low number of edges. You may want to look into the QuickHull method as an alternative.</li><li>Depth. As for the light, a single depth value is used for proper display of overlapping objects.</li><li>Shadow depth offset. The importance of this is described later.</li><li>Centre position. The centre of the hull is approximated by averaging all the points on the edge of the hull. While not an exact geometric centre it is close enough for our needs.</li><li>Vertex data. Other data associated with the vertex positions. Currently only per-vertex colours but texture cords could be added without requiring any major changes.</li></ul><h1>Rendering Overview</h1><br />The basic rendering process for a single frame looks like:<br /><br /><ol class='bbc'><li>Clear screen, initialise camera matrix </li><li>Fill z buffer with all visible objects. </li><li>For every light: <ol class='bbc'><li>Clear alpha buffer </li><li>Load alpha buffer with light intensity </li><li>Mask away shadow regions with shadow geometry </li><li>Render geometry with full detail (colours, textures etc.) modulated by the light intensity. </li></ol></li></ol><br />The essential point from the above is that a rendering pass is performed for every visible light, during which the alpha buffer is used to accumulate the lights intensity. Once the final intensity values for the light have been created in the alpha buffer, we render all the geometry modulated by the values in the alpha buffer.<br /><br /><h1>Simple Light Attenuation</h1><br />First we'll set up the foundation for the lighting – converting the above pseudo code into actual code but without the shadow generation for now.<br /><br /><pre class='prettyprint lang-auto linenums:0'>
public void render(Scene scene, GLDrawable canvas)
{
&nbsp;&nbsp;GL gl = canvas.getGL();
&nbsp;&nbsp;gl.glDepthMask(true);
&nbsp;&nbsp;gl.glClearDepth(1f);
&nbsp;&nbsp;gl.glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
&nbsp;&nbsp;gl.glClear(GL.GL_COLOR_BUFFER_BIT |
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GL.GL_DEPTH_BUFFER_BIT |
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; GL.GL_STENCIL_BUFFER_BIT);
&nbsp;&nbsp;gl.glMatrixMode(GL.GL_PROJECTION);
&nbsp;&nbsp;gl.glLoadIdentity();
&nbsp;&nbsp;gl.glMatrixMode(GL.GL_MODELVIEW);
&nbsp;&nbsp;gl.glLoadIdentity();
&nbsp;&nbsp;gl.glMatrixMode(GL.GL_TEXTURE);
&nbsp;&nbsp;gl.glLoadIdentity();
&nbsp;&nbsp;gl.glDisable(GL.GL_CULL_FACE);
&nbsp;&nbsp;findVisibleLights(scene);
&nbsp;&nbsp;Camera activeCamera = scene.getActiveCamera();
&nbsp;&nbsp;activeCamera.preRender(canvas);
&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;// First we need to fill the z-buffer
&nbsp;&nbsp;&nbsp;&nbsp;findVisibleObjects(scene, null);
&nbsp;&nbsp;&nbsp;&nbsp;fillZBuffer(canvas);
&nbsp;&nbsp;&nbsp;&nbsp;// For every light
&nbsp;&nbsp;&nbsp;&nbsp;for (int lightIndex=0; lightIndex&lt;visibleLights.size(); lightIndex++)
&nbsp;&nbsp;&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Light currentLight = (Light)visibleLights.get(lightIndex);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Clear current alpha
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;clearFramebufferAlpha(scene, currentLight, canvas);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Load new alpha
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;writeFramebufferAlpha(currentLight, canvas);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Mask off shadow regions
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mergeShadowHulls(scene, currentLight, canvas);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Draw geometry pass
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;drawGeometryPass(currentLight, canvas);
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;// Emmissive / self illumination pass
&nbsp;&nbsp;&nbsp;&nbsp;// ..
&nbsp;&nbsp;&nbsp;&nbsp;// Wireframe editor handles
&nbsp;&nbsp;&nbsp;&nbsp;drawEditControls(canvas);
&nbsp;&nbsp;}
&nbsp;&nbsp;activeCamera.postRender(canvas);
}
</pre><br />Note that code here is written in Java, using the Jogl set of bindings to OpenGL. For C++ programmers you simply have to remember that primitives such as int, float, boolean etc. are always passed by value, and objects are always passed by reference. OpenGL commands and enumerations are scoped to a GL object, which leads to the slightly extended notation from the straight C style.<br /><br />First we reset the GL state ready for the next frame, collect all the lights that we will need to render this frame and retrieve the currently active camera from the scene. Camera.preRender() and .postRender() are used to set the modelview and projection matrices to that needed for the view position.<br /><br />Once this initialisation is complete we need to fill the z-buffer for the whole scene. Although not discussed here, this would be the perfect place to take advantage of your favourite type of spatial tree. A quad-tree or AABB-tree would make a good choice for inclusion within the scene, and would be used for all testing of objects against the view frustum. To fill the depth buffer we simply enable z-buffer reading and writing, but with colour writing disabled to leave the colour buffer untouched. This creates a perfect depth buffer for us to use and stops later stages blending pixels hidden from view. It is worth noting that by enabling colour writing an ambient lighting pass can be added here to do both jobs at the same time. From this point onwards we can disable depth writing as it no longer needs to be updated.<br /><br />Now we perform a rendering pass for every light.<br /><br />First the alpha buffer is cleared in preparation for its use. This is simply a full screen quad drawn without blending, depth testing or colour writing to reset the alpha channel in the framebuffer to 0f. Since we don't want to disturb the current camera matrices that have been set up, we create this quad by using the current camera position to determine the quads coordinates.<br /><br />Next we need to load the lights intensity into the alpha buffer. This does not need any blending, but depth testing is enabled this time to allow lights to be restricted to illuminating only the objects beneath them. Again colour writing is left disabled since we are not ready to render any visible geometry yet. The following function is used to create the geometry for a single light:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
public void renderLightAlpha(float intensity, GLDrawable canvas)
{
&nbsp;&nbsp;assert (intensity &gt; 0f &amp;&amp; intensity &lt;= 1f);
&nbsp;&nbsp;GL gl = canvas.getGL();
&nbsp;&nbsp;int numSubdivisions = 32;
&nbsp;&nbsp;gl.glBegin(GL.GL_TRIANGLE_FAN);
&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;gl.glColor4f(0f, 0f, 0f, intensity);
&nbsp;&nbsp;&nbsp;&nbsp;gl.glVertex3f(center.x, center.y, depth);
&nbsp;&nbsp;&nbsp;&nbsp;// Set edge colour for rest of shape
&nbsp;&nbsp;&nbsp;&nbsp;gl.glColor4f(0f, 0f, 0f, 0f);
&nbsp;&nbsp;&nbsp;&nbsp;for (float angle=0; angle&lt;=Math.PI*2;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; angle+=((Math.PI*2)/numSubdivisions) )
&nbsp;&nbsp;&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;gl.glVertex3f( radius*(float)Math.cos(angle) + center.x,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; radius*(float)Math.sin(angle) + center.y, depth); 
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;gl.glVertex3f(center.x+radius, center.y, depth);
&nbsp;&nbsp;}
&nbsp;&nbsp;gl.glEnd();
}
</pre><br />What happens is we create a triangle fan rooted at the centre position of the light, then loop around in a circle creating additional vertices as we go. The alpha value of the centre point is our light intensity, fading linearly to zero on the edges of the circle. This creates the smooth light fall off seen in the first image. If other methods of light attenuation are needed, they can be generated here. An interesting alternative would be to use an alpha texture instead of vertex colours; a 1D texture could happily represent a non-linear set of light intensities. Other unusual effects could be achieved by animating the texture coordinates over a 2D texture, such as flickering candlelight or a pulsing light source.<br /><br />So now we have our light intensity values in the alpha buffer, we will skip the generation of shadow hulls for the moment and move on to getting our level geometry up on the screen.<br /><br />The geometry pass is where we really start to see things coming together, using the results we have carefully composed in the alpha of the framebuffer. First we need to make sure we have depth testing enabled (using less-than-or-equals as before), and then enable and set up our blending equation correctly. <br /><br /><pre class='prettyprint lang-auto linenums:0'>
&nbsp;&nbsp;gl.glEnable(GL.GL_BLEND);
&nbsp;&nbsp;gl.glBlendFunc(GL.GL_DST_ALPHA, GL.GL_ONE);
</pre><br />Simple, yes? What we're doing here is multiplying our incoming fragments (from the geometry we're about to draw) by the alpha values already sitting in the framebuffer. This means any alpha values of 1 will now be drawn at full intensity, and values of 0 being unchanged. This is then added to the current framebuffer colour multiplied by one. This addition to the existing colour means we slowly accumulate our results from previous passes. With our blend equation set up, we simply render our geometry as normal, using whatever vertex colours and textures that takes our fancy.<br /><br />If you take another look at our render() function near the top, you'll see we've almost finished composing our frame. Once we've looped over all the lights we've practically finished, but we'll insert a couple of extra stages. First an emissive or self illumination pass – this is discussed near the end of the article. After this is a simple wireframe rendering with draws object outlines such as seen in the first image.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15118-0-17557600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15118" title="02BasicPP.gif - Size: 39.2K, Downloads: 63"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-65048500-1366374723_thumb.gif" id='ipb-attach-img-15118-0-17557600-1369539296' style='width:480;height:341' class='attach' width="480" height="341" alt="Attached Image: 02BasicPP.gif" /></a><br />Image 2: Per pixel lighting with intensities accumulated in the alpha buffer.<br /></p><br /><h1>Coloured Lighting</h1><br />What was once seen as 'the next big thing' in the <i>Quake 2</i> and <i>Unreal</i> era, coloured lighting is pretty much standard by now, and a powerful tool for level designers to add atmosphere to a scene. Now since we've already got our light intensity ready and waiting for our geometry in the alpha buffer, all we need to do is modulate the geometry colour by the current light colour while drawing. That's a whole lot of multiplication if we want to do it ourselves, but on TnL hardware we can get it practically for free with a simple trick. We enable lighting while drawing our geometry; yet define no normals for we have no need of them. Instead we just enable a single light and set its ambient colour to the colour of our current light. The graphics card will calculate the effect of the light colour on our geometry for us and we need barely lift a finger. Note that because we're accumulating light intensities over multiple lights in the framebuffer we get accurate over brightening effects when lights overlap, and multiple coloured lights will merge and produce white illumination of our objects.<br /><br /><h1>Hard-edged Shadow Casting</h1><br />Now we have our lights correctly illuminating their surroundings we can start thinking about correctly limiting their light to add shadows into the scene. First we will cast hard edged shadows from shadow casters and then extend this to cover soft edged shadows with correct umbra and penumbra. This is done in the function we previously skipped, mergeShadowHulls().<br /><br />You will remember that at this point in the rendering we have the light intensity stored in the alpha buffer. Now what we will do is create geometry to represent the shadow from each shadow caster, then merge this into the alpha buffer. This is done inside the ConvexHull class.<br /><br /><h1>Finding the boundary points</h1><br />Our first step is to determine which points our shadow should be cast from. The list of points that make up the ConvexHull is looped though, and each edge is classified in regard to the light position. In pseudo code:<br /><ul class='bbc'><li>For every edge: <ul class='bbc'><li>Find normal for edge&lt; </li><li>Classify edge as front facing or back facing </li><li>Determine if either edge points are boundary points or not. </li></ul></li></ul>The normal for the edge is found as:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
&nbsp;&nbsp;float nx = currentPoint.y - prevPoint.y;
&nbsp;&nbsp;float ny = currentPoint.x - prevPoint.x;
</pre><br />Then a dot product is performed with this vector and the vector to the light. If this is greater than zero, the edge is front facing. Once and edge has been classified, it is compared against the previous edge. If one is front facing and the other back facing, then the shared vertex is a boundary point. As we walk around the edge of the hull (in an anti-clockwise direction) the boundary point from light to shadow is the start shadow point. The boundary from shadow to light is the end shadow point.<br /><br /><h1>Creating the Shadow Geometry</h1><br />Once we have these positions, we can generate our shadow geometry. Since we are only generating hard edged shadows at the moment, we will be ignoring the physical size of our light source. Image 3 shows a how the shadow geometry is built.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15119-0-17573500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15119" title="03HullGeneration.gif - Size: 13.19K, Downloads: 31"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-48440800-1366374857.gif" id='ipb-attach-img-15119-0-17573500-1369539296' style='width:452;height:393' class='attach' width="452" height="393" alt="Attached Image: 03HullGeneration.gif" /></a><br />Image 3: Hard-edged shadow generation<br /></p><br />As shown in the image, the shadow geometry is a single triangle strip projected outwards from the back facing edges of the shadow caster. We start at the first boundary point (marked with a red cross) and work our way anti-clockwise. The second point is found by finding the vector from the light to the point, and using this to project the point away from the light. A projection scale amount is used to ensure that the edges of the shadow geometry are always off screen. For now we can simply set this to a sufficiently large number, but later it will be advantageous to calculate this every frame depending on how far zoomed in or out the camera is.<br /><br />We render the shadow geometry with depth testing enabled to properly layer the shadow between various other objects in the world, but with colour writing disabled, only the alpha in the framebuffer is changed. You may remember that the final geometry pass is modulated (multiplied) by the existing alpha values, which means we need to set the shadow to have an alpha value of zero. Because the framebuffer will clamp the values to between one and zero, overlapping shadows will not make an affected area too dark but instead merge correctly.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15120-0-17590800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15120" title="04HardShadows.gif - Size: 22.31K, Downloads: 50"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-44303600-1366374898.gif" id='ipb-attach-img-15120-0-17590800-1369539296' style='width:420;height:384' class='attach' width="420" height="384" alt="Attached Image: 04HardShadows.gif" /></a><br />Image 4: Hard-edged shadows<br /></p><br />Notice in image 4 how the shadow from the diamond correctly obscures the rightmost object, and that their shadows are correctly merged where they overlap.<br /><br /><h1>Soft-Edged Shadow Casting</h1><br />Now we can properly construct hard edged shadows it is time to extend this to cover soft shadows – note that we cannot simply add faded edges to the existing shadow geometry, since this would result in inaccurate penumbra and umbra regions. First we start by defining a physical radius for the light source to generate the correct penumbra regions, then we need to create the penumbra geometry and modify the creation of the umbra region that we used for the hard edged shadows.<br /><br /><h1>Shadow Fins</h1><br />Each penumbra region will be created by one or more shadow fins via that ConvexHull and ShadowFin classes.<br /><br />ShadowFin: An object to encompass all or part of a penumbra region. <br /><br />Contains: <br /><ul class='bbc'><li>Root position. This is the position from which the fin protrudes from. </li><li>Penumbra vector. This is a vector from the root position which lies on the outer edge of the fin (the highest light intensity). </li><li>Umbra vector. This vector from the root position lies on the inner edge of the fin (lowest light intensity). </li><li>Penumbra and umbra intensities. These are the light intensities of their relative edges for the fin. If the fin makes up an entire penumbra region these are one and zero respectively. </li></ul>We start at the first boundary point, and create a ShadowFin from this point. The root position becomes the boundary point, and the penumbra and umbra intensities are initially one and zero. The difficult part of the fin – the penumbra and umbra vectors – is done by the getUmbraVector and getPenumbraVector methods within our Light object. <br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15121-0-17606100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15121" title="05FinGeneration.gif - Size: 6.74K, Downloads: 41"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-94168500-1366375080_thumb.gif" id='ipb-attach-img-15121-0-17606100-1369539296' style='width:480;height:249' class='attach' width="480" height="249" alt="Attached Image: 05FinGeneration.gif" /></a><br />Image 5: Shadow fin generation<br /></p><br />If we look at the vector that lies along the outer penumbra edge we can imagine it as the vector from the light though the boundary point (C, the centre vector) displaced by the light radius. So we must find this displacement.<br /><br />First we note that the displacement is as right angles to the centre vector. So we take C and find this perpendicular vector in the same way we did to find the normals for the hull edges. Now although looking at the diagram we know which way we want this to point, when we're dealing with boundary points and light positions at all sorts of funny angles to each other we may end up with it pointing in the opposite direction to that which we expect. To solve this we find the vector from the centre of the hull to the boundary point (between the two Xs in the image), and take the dot product of this and the perpendicular vector. If this is less than zero, our vector is pointing in the wrong direction and we invert it.<br /><br />Armed with this new vector we normalise it and the centre vector, then add them together and we've found our crucial outer penumbra vector. Finding the inner vector requires we repeat the process but this time we invert the logic for the dot product test to displace the centre vector in the opposite direction. We now have a fully calculated shadow fin to send to our renderer!<br /><br /><h2>Non-Linear Shading</h2><br />Although we have all the numbers we need to render our shadow fin, we'll soon hit a major snag – we can't use regular old vertex colours this time to write to the alpha buffer. We need the inner edge of the penumbra to be zero intensity (zero alpha) and our outer edge to be fully bright (alpha of one). While you can probably visualise that easily, getting our graphics card to actually render a triangle with the colours like this just isn't possible. Try it yourself if you're not sure, you'll soon see how it's the root point that causes the problems – it lies on both edges, so needs to be 'both' zero and one at the same time.<br /><br />The solution to this (and indeed most cases when you need non linear shading) is to abandon vertex colours for the fins and instead use a texture to hold the information. Below is a copy of the texture I used.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15122-0-17621300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15122" title="06PenumbraTexture.png - Size: 10.04K, Downloads: 37"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-94622100-1366375148.png" id='ipb-attach-img-15122-0-17621300-1369539296' style='width:256;height:256' class='attach' width="256" height="256" alt="Attached Image: 06PenumbraTexture.png" /></a><br />Image 6 : Penumbra texture<br /></p><br />You can clearly see how the shadow fin will be rooted at the bottom left, and the two edges running vertical and diagonally to the top edge. Since we don't want texels from the right edge bleeding into the left we set the texture wrapping mode to clamp to the edge values (using glTexParameteri and GL_CLAMP_TO_EDGE). The bottom right half of the texture is unused, although if you really wanted to you could pack something else in here just as long as you're careful not to bleed over the edge.<br /><br />So we load this texture and bind it for use before drawing our shadow fins, and set the vertex colour to white to leave the texture unchanged by it. Other than that rendering the fins is no different from the shadow hull. The only other thing we need to watch out for is how far back we project our points by the umbra/penumbra vectors, as the limited resolution of our penumbra texture will show if these are moved too far away. Ideally they will be projected to just off screen.<br /><br /><h2>Modifying the umbra generation</h2><br />Now we've got the fins drawn, we can fill in the umbra between them. This is done in almost exactly the same way as with hard shadows, except we must use the fins inner edges to start and finish from instead of directly projecting away from the centre of the light source. As we move across the back of the shadow caster, we perform a weighted average between these two edge vectors to properly fill in the umbra region. When done correctly we see no gaps between the fins and the umbra geometry, giving us one consistent, accurate shadow cast into the alpha buffer.<br /><br /><h1>Making it robust</h1><br /><h2>Self-Intersection</h2><br />Now we have this far implemented, the shadows will be looking quite nice – when static – however problems will become apparent when moving the light sources around. The most glaring is that of self-intersection.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15123-0-17636400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15123" title="07SelfIntersection.gif - Size: 6.65K, Downloads: 52"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-50524700-1366375217.gif" id='ipb-attach-img-15123-0-17636400-1369539296' style='width:357;height:331' class='attach' width="357" height="331" alt="Attached Image: 07SelfIntersection.gif" /></a><br />Image 7: Self intersection of shadow fin<br /></p><br />If the light source is too large in relation to the object, or too near, the inner penumbra edge will intersect the hull. First we need to detect this case. We find the vector from the boundary point to the next edge point in the shadow (moving anti clockwise here since we're on the shadow start boundary point). Then we compare the angle between the outer penumbra edge and our newly found hull edge, and the angle between the outer and inner penumbra edges. If the angle to the boundary edge is smaller, then we've got an intersection case we need to fix.<br /><br />First we snap the current fin to the boundary edge, and calculate the new intensity for the inner edge via the ratio of the angles. Then we create a new shadow fin at the next point on the hull. This has an outer edge set to the same vector and intensity as the existing fins inner edge, while the new fins inner edge is calculated as before. By gluing these two fins together we create a single smooth shadow edge. Technically we should repeat the self-intersection test and keep adding more fins as needed, however I've not found that this is needed in practice.<br /><br /><h2>Eliminating 'Popping'</h2><br />You will also notice one other problem with this as it stands, the shadow fins will 'pop' along the edges of the hull as a light rotates. This is because we're still using an infinitely small point light to find the boundary points. To solve this we should take the physical radius into account when finding them. A robust way of doing this is to shift the light source position towards the hull by the radius distance before we find our boundary points. With these two fixes in place the fins will be visually stable as either the light or the hull moves (or both!).<br /><br /><h2>Depth Offset</h2><br />Depending on the style of game and the view used (such as a side scrolling platformer as opposed to a top down shooter) the way light and shadow interacts with the various level objects will be different. What seems sensible for one may appear wrong in another. Most obviously is with objects casting shadows onto objects at the same depth.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[a3aad1602ad94df9c8924c518f692e6b]' id='ipb-attach-url-15124-0-17651400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15124" title="08ShadowOffset.gif - Size: 30.3K, Downloads: 100"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-209764-0-79065200-1366375273_thumb.gif" id='ipb-attach-img-15124-0-17651400-1369539296' style='width:480;height:197' class='attach' width="480" height="197" alt="Attached Image: 08ShadowOffset.gif" /></a><br />Image 8: The effect of shadow offset<br /></p><br />The image above shows the same scene with different shadow offsets. Imagine that the scene is a top down viewpoint – the light grey areas are impassable walls surrounding the floor showing a T junction (imagine hard!). Now the image on the right seems slightly out of place – the shadows are being projected on top of the walls, yet these are much higher than the light source – realistically they shouldn't be lit at all but solid black walls aren't very visually appealing. The second shows the shadows being offset and only obscuring the floor and any objects on it.<br /><br />Now if you were to imagine the same scene as a 2D platformer, you might prefer the left image. Here it seems to make more sense that the objects should shadow those on the same level. This decision is usually very dependent on the geometry and art direction of the level itself, so no hard and fast rules seem to apply. The best option seems to be to experiment and see which looks best.<br /><br />Adding control over this is a small modification. At the moment the scene on the left is the common case, and by generating shadow volumes that are a close fit to the edge of the shadow caster we've already done all the hard work, all we need to do is store a shadow depth offset in our ConvexHull and apply it to the depth of the shadow geometry. The existing depth testing will reject fragments drawn behind other objects and leave them at the original intensity.<br /><br /><h2>Emmissive / Self Illumination pass</h2><br />This is a simple addition that can produce some neat effects – and can be seen as a generalisation of the wireframe 'full-bright' pass. After the lights have been drawn, we clear the alpha buffer again as before, but instead of writing light intensities into it we render our scene geometry with their associated emissive surface. This is an alpha texture used to control light intensities as before, and can be used for glowing objects, such as a computer display or a piece of hardware with a bank of LEDs - everything that has its own light source but is too small to require an individual light of its own. Because these are so small, we skip the shadow generation and can do them all in one go. Then we modulate the scene geometry by this alpha intensity as before. Unusual effects are possible with this, such as particles which light up their immediate surroundings, or the bright local illumination provided by a neon sign (with one or two associated lights to provide the lighting at medium and long range).<br /><br /><h2>Scissor Testing</h2><br />We are extending the shadow geometry until it's off the edge of the screen, but often the area a light affects is much smaller than this. The scissor test (glScissor in OpenGL) allows us to restrict rendering to a rectangle within our window and avoid drawing pixels that have no effect. We just have to project the light's bounds to screen space and set the scissor area before drawing the shadow geometry. This can increase the framerate considerably.<br /><br /><h1>Conclusion</h1><br />After a lot of work, much maths and a few sneaky tricks and we've finally got a shadow system that's both physically accurate and robust. The hardware requirements are modest indeed – a framebuffer with an alpha component is about all that's required, we don't even need to stray into extensions to get the correct effect. There is a whole heap of possible optimisations and areas for improvement, notably caching the calculated shadow geometry when neither the light nor the caster has changed, and including some sort of spatial tree so that only the shadow casters that are actually needed are used for each light.]]></description>
		<pubDate>Fri, 19 Apr 2013 12:42:15 +0000</pubDate>
		<guid isPermaLink="false">fed02ce0e96f989ec31e4eb6596bb06e</guid>
	</item>
	<item>
		<title>Thin Film Interference for Computer Graphics</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/thin-film-interference-for-computer-graphics-r2962</link>
		<description><![CDATA[<h1>Note</h1><br />The theoretical parts of this article require some knowledge of optics and electromagnetism, however the conclusion and final result (a practical implementation of single-layer thin film interference in the context of a BRDF) do not. You may therefore wish to skip the theoretical sections.<br /><br /><h1>Introduction</h1><br />Wave interference of light has been neglected for a long time in computer graphics, for multiple reasons. Firstly, it is often insignificant and can be cheaply approximated, or even ignored completely. Secondly, it is harder to understand as it requires interpreting light as waves instead of particles (photons). However, interference crops up almost everywhere in daily life, and has recently gained popularity in rendering applications.<br /><br />Examples of wave interference of light are soap bubbles, gasoline rainbow patterns, lens flares, basically everything that looks cool and/or involves multicolor patterns. For instance, in computer graphics, soap bubbles were in the past approximated with more or less realistic multicolor textures slightly panned with view angle. But it turns out that they are not that computationally difficult to accurately render. We will learn how.<br /><br />This article will focus on one particular form of interference, namely thin film interference. This occurs when one or more very thin transparent coatings ("films") are placed on top of a material. The films are so thin that when a light wave comes into contact with these film layers, it reflects and refracts multiple times inside the layer system, and interferes with itself in the process.<br /><p style='text-align:center'><br />&nbsp;&nbsp;&nbsp;&nbsp;<a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15218-0-19888700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15218" title="general.png - Size: 40.65K, Downloads: 54"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-48284000-1366821363_thumb.png" id='ipb-attach-img-15218-0-19888700-1369539296' style='width:480;height:361' class='attach' width="480" height="361" alt="Attached Image: general.png" /></a><br /></p><br />The goal is to calculate the amount of light reflected off the layer system, and the amount of light transmitted into the internal medium. We will make the assumption that no light is absorbed, which is not required but makes the calculations more approachable as considering absorption of light involves delving deep into Maxwell's equations (the behaviour of electromagnetic waves at interfaces of lossy media is nontrivial). Though in general, each layer is so thin that absorption effects can be neglected most of the time.<br /><br />We will derive a physical solution for the case where <i>only one film</i> is present (single-layer) and conclude on how to solve the general case with arbitrarily many layers. The single-layer case is sufficient to render most real life occurrences of thin-film interference, however using more layers enables many more advanced effects. The cost of calculating reflection and transmission coefficients is linear in the number of layers.<br /><br /><h1>Derivation</h1><br />Consider a light wave incident to a thin layer of depth <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_8db396254ab12c726e0033b9f2e24a8e_l3.png" alt="ql_8db396254ab12c726e0033b9f2e24a8e_l3.p"></span> and real refractive index <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_880a79b912e64732d69ff4bec6b1c95b_l3.png" alt="ql_880a79b912e64732d69ff4bec6b1c95b_l3.p"></span>. The external medium has refractive index <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_b71be1e60ab16cc226daa59e7bc1b2e1_l3.png" alt="ql_b71be1e60ab16cc226daa59e7bc1b2e1_l3.p"></span> and the internal medium has refractive index <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_6b8c89dae4ee147969c3105fa156ff01_l3.png" alt="ql_6b8c89dae4ee147969c3105fa156ff01_l3.p"></span>. The incident angle made by the incident light wave and the film's surface normal is <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_98beee5106cca262c7356449feab7907_l3.png" alt="ql_98beee5106cca262c7356449feab7907_l3.p"></span>, the angle inside the layer is <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_83f648fc89a8d8da654491ffd1cf031a_l3.png" alt="ql_83f648fc89a8d8da654491ffd1cf031a_l3.p"></span> and the refracted angle (inside the internal medium) is denoted <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_ceb2fac0fc8c436edb58ce83e171c6b7_l3.png" alt="ql_ceb2fac0fc8c436edb58ce83e171c6b7_l3.p"></span>. We will also give numbers to each of the three media: medium 0 is the external medium, medium 1 is the layer, and medium 2 is the internal medium. Also, naturally, medium 0 has to have a different refractive index than medium 1, and the same goes for medium 1 and medium 2. Media 0 and 2 can be the same, of course.<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15219-0-19906100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15219" title="single_layer.png - Size: 31.1K, Downloads: 50"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-33399500-1366822907_thumb.png" id='ipb-attach-img-15219-0-19906100-1369539296' style='width:480;height:281' class='attach' width="480" height="281" alt="Attached Image: single_layer.png" /></a></p><br />First, we know from Snell's Law that the following holds:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15287-0-19939800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15287" title="snells.png - Size: 1017bytes, Downloads: 30"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-58307400-1367061143.png" id='ipb-attach-img-15287-0-19939800-1369539296' style='width:232;height:16' class='attach' width="232" height="16" alt="Attached Image: snells.png" /></a></p><br />Therefore the angles <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_83f648fc89a8d8da654491ffd1cf031a_l3.png" alt="ql_83f648fc89a8d8da654491ffd1cf031a_l3.p"></span> and <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_ceb2fac0fc8c436edb58ce83e171c6b7_l3.png" alt="ql_ceb2fac0fc8c436edb58ce83e171c6b7_l3.p"></span> can be derived from <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_98beee5106cca262c7356449feab7907_l3.png" alt="ql_98beee5106cca262c7356449feab7907_l3.p"></span>.<br /><br />Now, we see from the diagram that the only path the light wave can follow is a zigzag pattern as it bounces back and forth between the layer, until it gets transmitted either back into the external medium or into the internal medium. We also note that all the reflected waves (denoted <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_57c5d6414a896a666dffb92164d83b6f_l3.png" alt="ql_57c5d6414a896a666dffb92164d83b6f_l3.p"></span>, <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_90840da0f2e709d09d62afebc9ca6298_l3.png" alt="ql_90840da0f2e709d09d62afebc9ca6298_l3.p"></span>, ...) and all the transmitted waves are parallel. This is necessary for interference and is a natural consequence of the reciprocal nature of Snell's Law.<br /><br />And we have assumed that the media involved are non-absorbing, therefore by conservation of energy the reflection and transmission coefficients must sum up to exactly one. It turns out that it is slightly easier to derive the transmission coefficient, so we will do that, but we would get the same thing either way. The reason for this is because the very first reflected wave <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_57c5d6414a896a666dffb92164d83b6f_l3.png" alt="ql_57c5d6414a896a666dffb92164d83b6f_l3.p"></span> does not actually penetrate the layer, which means it needs to be handled separately. This does not occur for transmitted waves.<br /><br />Now let's take a look at what happens to the amplitude of the light wave as it travels through this layer system. First, we need to introduce the Fresnel equations, which let us calculate how much of a light wave's amplitude is reflected and how much of it is transmitted whenever it comes into contact with an interface. These equations should be familiar, although perhaps not in the following form:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15288-0-19955600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15288" title="fresnel_rs.png - Size: 1.04K, Downloads: 30"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-41688100-1367061738.png" id='ipb-attach-img-15288-0-19955600-1369539296' style='width:151;height:28' class='attach' width="151" height="28" alt="Attached Image: fresnel_rs.png" /></a></p><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15289-0-19971800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15289" title="fresnel_ts.png - Size: 1.01K, Downloads: 27"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-03681400-1367061751.png" id='ipb-attach-img-15289-0-19971800-1369539296' style='width:149;height:26' class='attach' width="149" height="26" alt="Attached Image: fresnel_ts.png" /></a></p><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15290-0-19990100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15290" title="fresnel_rp.png - Size: 1.11K, Downloads: 25"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-44576500-1367061760.png" id='ipb-attach-img-15290-0-19990100-1369539296' style='width:151;height:28' class='attach' width="151" height="28" alt="Attached Image: fresnel_rp.png" /></a></p><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15291-0-20006500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15291" title="fresnel_tp.png - Size: 1.08K, Downloads: 29"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-52933400-1367061768.png" id='ipb-attach-img-15291-0-20006500-1369539296' style='width:150;height:26' class='attach' width="150" height="26" alt="Attached Image: fresnel_tp.png" /></a></p><br />These are amplitude reflection/transmission coefficients, for s-polarized and p-polarized light. Indeed, light polarization is important, and in the derivation we will assume the light wave has a given known polarization.<br /><br />We will now introduce some notation. The following denotes the amplitude reflection coefficient for a light wave going from medium <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_ada19e577da69464d9da2fd1ab711f9e_l3.png" alt="ql_ada19e577da69464d9da2fd1ab711f9e_l3.p"></span> to medium <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_9ee1f75643e42fc1b252f9a1283ba0cd_l3.png" alt="ql_9ee1f75643e42fc1b252f9a1283ba0cd_l3.p"></span>:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15292-0-20023000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15292" title="rho.png - Size: 765bytes, Downloads: 24"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-58980500-1367061818.png" id='ipb-attach-img-15292-0-20023000-1369539296' style='width:76;height:15' class='attach' width="76" height="15" alt="Attached Image: rho.png" /></a></p><br />Where the correct reflection coefficient is chosen based on the light wave's polarization. Similarly, the amplitude transmission coefficient is:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15293-0-20039100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15293" title="tau.png - Size: 761bytes, Downloads: 25"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-11088800-1367061840.png" id='ipb-attach-img-15293-0-20039100-1369539296' style='width:73;height:19' class='attach' width="73" height="19" alt="Attached Image: tau.png" /></a></p><br />Because the refractive indices and incident angles for each medium are known and constant, we do not need to specify them.<br /><br />We are now ready to tackle the problem. Consider the transmitted wave <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_9e974d873afbaeda9ba7f17d433c7974_l3.png" alt="ql_9e974d873afbaeda9ba7f17d433c7974_l3.p"></span>. It's easy to see that since it crosses the layer at two locations, and never reflects anywhere, its amplitude will be:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15294-0-20055300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15294" title="wave_t0.png - Size: 616bytes, Downloads: 26"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-42360600-1367061879.png" id='ipb-attach-img-15294-0-20055300-1369539296' style='width:52;height:15' class='attach' width="52" height="15" alt="Attached Image: wave_t0.png" /></a></p><br />What about the second transmitted wave <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_cb09b29f6c46fc1a96c4872e9af6301a_l3.png" alt="ql_cb09b29f6c46fc1a96c4872e9af6301a_l3.p"></span>? This one is transmitted once from medium 0 to medium 1, reflects off the medium 1 to medium 2 interface, is reflected again from the medium 1 to medium 0 interface, and is finally transmitted across the medium 1 to medium 2 interface. So its amplitude will be:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15295-0-20071400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15295" title="wave_t1.png - Size: 915bytes, Downloads: 24"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-65766500-1367061899.png" id='ipb-attach-img-15295-0-20071400-1369539296' style='width:238;height:15' class='attach' width="238" height="15" alt="Attached Image: wave_t1.png" /></a></p><br />We can see there's a pattern here. Every successive transmitted wave will simply reflect two additional times off the top and bottom interface. So, if we denote the amplitude of the <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_55ed0175c57d95796d611d60ea4d1412_l3.png" alt="ql_55ed0175c57d95796d611d60ea4d1412_l3.p"></span>th transmitted wave <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_09ab9b090983668aa83e9aa5004285c7_l3.png" alt="ql_09ab9b090983668aa83e9aa5004285c7_l3.p"></span>, we have:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15296-0-20089600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15296" title="wave_tk.png - Size: 1.11K, Downloads: 25"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-72779800-1367061926.png" id='ipb-attach-img-15296-0-20089600-1369539296' style='width:152;height:24' class='attach' width="152" height="24" alt="Attached Image: wave_tk.png" /></a></p><br />We note that even though there are (in theory) infinitely many transmitted waves, their amplitude decreases exponentially, since the Fresnel amplitude reflection coefficient is never quite 1 (except in the case of total internal reflection, where all light is reflected and none is transmitted, of course, if this is the case then the incident wave fully reflects off the layer first chance it gets and so this analysis doesn't apply).<br /><br />We now have the amplitudes of each transmitted wave. Can we calculate the total amount of transmitted light now? Not quite. These are waves, and you can't just add waves using their amplitudes. We need to consider the phase of each transmitted wave, as these waves might cancel each other out depending on their phase (out of phase waves cancel out, in phase waves amplify each other). The waves also have a frequency, but the frequency depends only on the incident wave's wavelength, which is known and constant, so it can be taken out of the equation.<br /><br />How do we calculate the phase of each transmitted wave? This is in fact a simple textbook thin film interference problem, and if we denote the phase of the <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_55ed0175c57d95796d611d60ea4d1412_l3.png" alt="ql_55ed0175c57d95796d611d60ea4d1412_l3.p"></span>th transmitted wave <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_31dd6ec80e46d6558c39a65ffc58883e_l3.png" alt="ql_31dd6ec80e46d6558c39a65ffc58883e_l3.p"></span>, the following holds:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15297-0-20106800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15297" title="phase_1.png - Size: 1.84K, Downloads: 26"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-41602700-1367061963.png" id='ipb-attach-img-15297-0-20106800-1369539296' style='width:229;height:43' class='attach' width="229" height="43" alt="Attached Image: phase_1.png" /></a></p><br />Where <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_4caf6afa07fc6b25024e9c1eb6d95b69_l3.png" alt="ql_4caf6afa07fc6b25024e9c1eb6d95b69_l3.p"></span> is the light wave's wavelength and <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_e7887a1bc028543b0e3c94ac16a43891_l3.png" alt="ql_e7887a1bc028543b0e3c94ac16a43891_l3.p"></span> is a constant meant to account for phase changes upon reflection (we will expand on this soon). The important thing is that the phase of every transmitted wave is a multiple of a constant (with respect to the wave index <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_55ed0175c57d95796d611d60ea4d1412_l3.png" alt="ql_55ed0175c57d95796d611d60ea4d1412_l3.p"></span>)! That is:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15298-0-20123500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15298" title="phase_2.png - Size: 1.95K, Downloads: 26"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-16518500-1367061989.png" id='ipb-attach-img-15298-0-20123500-1369539296' style='width:278;height:43' class='attach' width="278" height="43" alt="Attached Image: phase_2.png" /></a></p><br />The explanation for this lies in the rather trivial observation that the distance travelled by the light wave inside the layer increases by a constant factor for every consecutive transmitted wave. This is fortunate, as it makes the upcoming calculations very simple. Had the phase depended on <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_55ed0175c57d95796d611d60ea4d1412_l3.png" alt="ql_55ed0175c57d95796d611d60ea4d1412_l3.p"></span> in a more complicated way, the problem could have very well been analytically intractable.<br /><br />We will now explain the meaning of the <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_e7887a1bc028543b0e3c94ac16a43891_l3.png" alt="ql_e7887a1bc028543b0e3c94ac16a43891_l3.p"></span> term. When a wave (any wave, not just electromagnetic light waves) reflects off a medium denser than the one it is in, it will undergo a 180-degree phase change. Because the refractive index is a measure of how dense a medium is, we can use that to calculate this constant. There are two possible reflections here: one at the top interface and one at the bottom interface. We denote:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15299-0-20140800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15299" title="phase_change.png - Size: 1.62K, Downloads: 28"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-94606400-1367062170.png" id='ipb-attach-img-15299-0-20140800-1369539296' style='width:182;height:54' class='attach' width="182" height="54" alt="Attached Image: phase_change.png" /></a></p><br />For the reflection phase change when reflecting off the interface from medium <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_ada19e577da69464d9da2fd1ab711f9e_l3.png" alt="ql_ada19e577da69464d9da2fd1ab711f9e_l3.p"></span> to medium <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_9ee1f75643e42fc1b252f9a1283ba0cd_l3.png" alt="ql_9ee1f75643e42fc1b252f9a1283ba0cd_l3.p"></span>. Therefore, we see that:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15300-0-20157600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15300" title="phase_change_2.png - Size: 770bytes, Downloads: 30"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-75751100-1367062198.png" id='ipb-attach-img-15300-0-20157600-1369539296' style='width:126;height:19' class='attach' width="126" height="19" alt="Attached Image: phase_change_2.png" /></a></p><br />Which is constant, as it depends only on the refractive indices of each medium.<br /><br />At this point we have the amplitude and phase of each transmitted wave. All we have to do is sum them up (as waves), and take the squared magnitude of the resulting complex amplitude to obtain the transmitted intensity. However, because the transmitted waves are in a different medium than the incident wave, we need to take into account the ratio of beam surface area to make sure energy is conserved. That is, we need to multiply by:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15423-0-20211400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15423" title="beamratio.png - Size: 959bytes, Downloads: 24"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-64931400-1367314883.png" id='ipb-attach-img-15423-0-20211400-1369539296' style='width:64;height:39' class='attach' width="64" height="39" alt="Attached Image: beamratio.png" /></a></p><br />This is actually two factors in one. The first, ratio of refractive indices, is there because the transmitted wave won't, in general, have the same speed as the incident wave (for instance, light travels slower in water than in air). So the perceived intensity will not be the same. Remember, intensity is energy per second per squared area, so if the wave is faster the intensity will be higher, so we need to scale the intensity down by a corresponding amount to make sure energy is conserved. The second factor, ratio of cosines, exists because of the change in area of a beam of light as it is refracted. The following diagram illustrates all of this nicely:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15431-0-20357900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15431" title="area_ratio.jpg - Size: 14.24K, Downloads: 24"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-72242200-1367319391.jpg" id='ipb-attach-img-15431-0-20357900-1369539296' style='width:216;height:258' class='attach' width="216" height="258" alt="Attached Image: area_ratio.jpg" /></a></p><br />It is worth noting that reflected light is treated the same, however because reflected waves remain in the same medium and the reflected angle is the same as the incident angle, both ratios just cancel out.<br /><br />Now, we have the following expression for the transmitted intensity:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15424-0-20228900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15424" title="sum_1.png - Size: 2.31K, Downloads: 25"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-59264900-1367314939.png" id='ipb-attach-img-15424-0-20228900-1369539296' style='width:212;height:58' class='attach' width="212" height="58" alt="Attached Image: sum_1.png" /></a></p><br />This looks complicated, but it actually isn't. This is because both the phase and the amplitude are dependent on <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_55ed0175c57d95796d611d60ea4d1412_l3.png" alt="ql_55ed0175c57d95796d611d60ea4d1412_l3.p"></span> in such a way that:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15425-0-20246900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15425" title="sum_2.png - Size: 3.22K, Downloads: 22"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-71446700-1367315267_thumb.png" id='ipb-attach-img-15425-0-20246900-1369539296' style='width:480;height:49' class='attach' width="480" height="49" alt="Attached Image: sum_2.png" /></a></p><br />And we will now use the following two substitutions, just to make the expressions a bit more readable:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15303-0-20173900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15303" title="alpha_beta.png - Size: 1.13K, Downloads: 28"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-84986000-1367062268.png" id='ipb-attach-img-15303-0-20173900-1369539296' style='width:89;height:43' class='attach' width="89" height="43" alt="Attached Image: alpha_beta.png" /></a></p><br />We now have a geometric series sum, which we can evaluate as follows:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15426-0-20264800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15426" title="sum_3.png - Size: 2.58K, Downloads: 21"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-19580700-1367315420.png" id='ipb-attach-img-15426-0-20264800-1369539296' style='width:414;height:59' class='attach' width="414" height="59" alt="Attached Image: sum_3.png" /></a></p><br />Simplifying rather elegantly to the following (assuming <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_9bfbb58994661cfbf2a8ce426799d0a1_l3.png" alt="ql_9bfbb58994661cfbf2a8ce426799d0a1_l3.p"></span> is real):<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15427-0-20285600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15427" title="final.png - Size: 1.73K, Downloads: 18"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-12219400-1367315541.png" id='ipb-attach-img-15427-0-20285600-1369539296' style='width:286;height:49' class='attach' width="286" height="49" alt="Attached Image: final.png" /></a></p><br />And by conservation of energy, we have:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15306-0-20194100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15306" title="energy.png - Size: 707bytes, Downloads: 49"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-05792500-1367062340.png" id='ipb-attach-img-15306-0-20194100-1369539296' style='width:91;height:15' class='attach' width="91" height="15" alt="Attached Image: energy.png" /></a></p><br />Which concludes the derivation. As a final note, we can calculate the average over all possible phases of this result. If we are correct, then we should get the same result as a geometric optics derivation. The reader can verify that, indeed, we have:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15429-0-20322200-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15429" title="optics.png - Size: 2.17K, Downloads: 52"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-64572700-1367316890.png" id='ipb-attach-img-15429-0-20322200-1369539296' style='width:338;height:49' class='attach' width="338" height="49" alt="Attached Image: optics.png" /></a></p><br />We conclude that the transmission coefficient (intensity of light transmitted across the layer) is <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_4b4cb771202386605525a15d3eccdf17_l3.png" alt="ql_4b4cb771202386605525a15d3eccdf17_l3.p"></span> and the reflection coefficient (intensity of light reflected off the layer) is <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_a60f13863fb1473bf5385046ce8b1a68_l3.png" alt="ql_a60f13863fb1473bf5385046ce8b1a68_l3.p"></span>.<br /><br /><h2>General Case</h2><br />The derivation shown above is quite naive, and does not generalize well at all to multiple layers, though it is the simplest method to see what is happening at a low level. If you wish to implement n-layer thin film interference, the method of choice is the <a href='http://en.wikipedia.org/wiki/Transfer-matrix_method_(optics)' class='bbc_url' title='External link' rel='nofollow external'>Transfer-matrix method</a>, which simplifies the problem down to a series of matrix multiplications and can be derived using powerful electromagnetism techniques.<br /><br /><h1>Implementation</h1><br />So we now know just how much light is reflected from the layer. How can we implement this in the context of a BRDF? It's quite simple: this reflected term simply replaces the ordinary Fresnel term, accounting for thin film interference effects. This means you can trivially include thin-film interference effects in any BRDF as long as it has a Fresnel term. The function below computes the reflection coefficient for a given wavelength and incident angle.<br /><br /><pre class='prettyprint lang-auto linenums:0'>
// cosI is the cosine of the incident angle, that is, cos0 = dot(view angle, normal)
// lambda is the wavelength of the incident light (e.g. lambda = 510 for green)
float ThinFilmReflectance(float cos0, float lambda)
{
&nbsp;&nbsp;&nbsp;&nbsp;const float thickness; // the thin film thickness
&nbsp;&nbsp;&nbsp;&nbsp;const float n0, n1, n2; // the refractive indices
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// compute the phase change term (constant)
&nbsp;&nbsp;&nbsp;&nbsp;const float d10 = (n1 &gt; n0) ? 0 : PI;
&nbsp;&nbsp;&nbsp;&nbsp;const float d12 = (n1 &gt; n2) ? 0 : PI;
&nbsp;&nbsp;&nbsp;&nbsp;const float delta = d10 + d12;
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// now, compute cos1, the cosine of the reflected angle
&nbsp;&nbsp;&nbsp;&nbsp;float sin1 = pow(n0 / n1, 2) * (1 - pow(cos0, 2));
&nbsp;&nbsp;&nbsp;&nbsp;if (sin1 &gt; 1) return 1.0f; // total internal reflection
&nbsp;&nbsp;&nbsp;&nbsp;float cos1 = sqrt(1 - sin1);
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// compute cos2, the cosine of the final transmitted angle, i.e. cos(theta_2)
&nbsp;&nbsp;&nbsp;&nbsp;// we need this angle for the Fresnel terms at the bottom interface
&nbsp;&nbsp;&nbsp;&nbsp;float sin2 = pow(n0 / n2, 2) * (1 - pow(cos0, 2));
&nbsp;&nbsp;&nbsp;&nbsp;if (sin2 &gt; 1) return 1.0f; // total internal reflection
&nbsp;&nbsp;&nbsp;&nbsp;float cos2 = sqrt(1 - sin2);
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// get the reflection transmission amplitude Fresnel coefficients
&nbsp;&nbsp;&nbsp;&nbsp;float alpha_s = rs(n1, n0, cos1, cos0) * rs(n1, n2, cos1, cos2); // rho_10 * rho_12 (s-polarized)
&nbsp;&nbsp;&nbsp;&nbsp;float alpha_p = rp(n1, n0, cos1, cos0) * rp(n1, n2, cos1, cos2); // rho_10 * rho_12 (p-polarized)
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;float beta_s = ts(n0, n1, cos0, cos1) * ts(n1, n2, cos1, cos2); // tau_01 * tau_12 (s-polarized)
&nbsp;&nbsp;&nbsp;&nbsp;float beta_p = tp(n0, n1, cos0, cos1) * tp(n1, n2, cos1, cos2); // tau_01 * tau_12 (p-polarized)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// compute the phase term (phi)
&nbsp;&nbsp;&nbsp;&nbsp;float phi = (2 * PI / lambda) * (2 * n1 * thickness * cos1) + delta;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// finally, evaluate the transmitted intensity for the two possible polarizations
&nbsp;&nbsp;&nbsp;&nbsp;float ts = pow(beta_s) / (pow(alpha_s, 2) - 2 * alpha_s * cos(phi) + 1);
&nbsp;&nbsp;&nbsp;&nbsp;float tp = pow(beta_p) / (pow(alpha_p, 2) - 2 * alpha_p * cos(phi) + 1);
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// we need to take into account conservation of energy for transmission
&nbsp;&nbsp;&nbsp;&nbsp;float beamRatio = (n2 * cos2) / (n0 * cos0);
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// calculate the average transmitted intensity (if you know the polarization distribution of your
&nbsp;&nbsp;&nbsp;&nbsp;// light source, you should specify it here. if you don't, a 50%/50% average is generally used)
&nbsp;&nbsp;&nbsp;&nbsp;float t = beamRatio * (ts + tp) / 2;
&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;// and finally, derive the reflected intensity
&nbsp;&nbsp;&nbsp;&nbsp;return 1 - t;
}
</pre><br />We can now sample this function at red, green, and blue wavelengths (650, 510, 475 nanometers, respectively) and substitute the RGB reflectance obtained into the Fresnel term of the BRDF. Or, if you are rendering spectrally, just give the wavelength directly. That's it.<br /><br />One word on polarization - in general, in computer graphics, we assume light contains an equal amount of s-polarized and p-polarized light waves. Then the Fresnel reflection coefficient is simply an average between the s-polarized and p-polarized light reflection coefficients, as the comment indicates. If you have more information on how much s-polarized light is emitted by your light source, then the average should reflect that.<br /><br /><h2>BRDF Explorer Sample</h2><br />The following shader script implements the BRDF in the Disney BRDF Explorer tool, using the stock Blinn-Phong shader with the default microfacet distribution. Note how we just implemented the code separately and multiplied the BRDF by the modified "thin film" Fresnel term.<br /><br /><pre class='prettyprint lang-auto linenums:0'>
analytic

# Blinn Phong based on halfway-vector with single-layer thin
# film wave interference effects via a Fresnel film coating.

::begin parameters
float thickness 0 3000 250&nbsp;&nbsp;# Thin film thickness (in nm)
float externalIOR 0.2 3 1&nbsp;&nbsp;&nbsp;&nbsp; # External (air) refractive index
float thinfilmIOR 0.2 3 1.5&nbsp;&nbsp; # Layer (thin film) refractive index
float internalIOR 0.2 3 1.25&nbsp;&nbsp;# Internal (object) refractive index
float n 1 1000 100&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Blinn-Phong microfacet exponent
::end parameters

::begin shader

const float PI = 3.14159265f;

/* Amplitude reflection coefficient (s-polarized) */
float rs(float n1, float n2, float cosI, float cosT)
{
&nbsp;&nbsp;&nbsp;&nbsp;return (n1 * cosI - n2 * cosT) / (n1 * cosI + n2 * cosT);
}

/* Amplitude reflection coefficient (p-polarized) */
float rp(float n1, float n2, float cosI, float cosT)
{
&nbsp;&nbsp;&nbsp;&nbsp;return (n2 * cosI - n1 * cosT) / (n1 * cosT + n2 * cosI);
}

/* Amplitude transmission coefficient (s-polarized) */
float ts(float n1, float n2, float cosI, float cosT)
{
&nbsp;&nbsp;&nbsp;&nbsp;return 2 * n1 * cosI / (n1 * cosI + n2 * cosT);
}

/* Amplitude transmission coefficient (p-polarized) */
float tp(float n1, float n2, float cosI, float cosT)
{
&nbsp;&nbsp;&nbsp;&nbsp;return 2 * n1 * cosI / (n1 * cosT + n2 * cosI);
}

/* Pass the incident cosine. */
vec3 FresnelCoating(float cos0)
{
&nbsp;&nbsp;&nbsp;&nbsp;/* Precompute the reflection phase changes (depends on IOR) */
&nbsp;&nbsp;&nbsp;&nbsp;float delta10 = (thinfilmIOR &lt; externalIOR) ? PI : 0.0f;
&nbsp;&nbsp;&nbsp;&nbsp;float delta12 = (thinfilmIOR &lt; internalIOR) ? PI : 0.0f;
&nbsp;&nbsp;&nbsp;&nbsp;float delta = delta10 + delta12;

&nbsp;&nbsp;&nbsp;&nbsp;/* Calculate the thin film layer (and transmitted) angle cosines. */
&nbsp;&nbsp;&nbsp;&nbsp;float sin1 = pow(externalIOR / thinfilmIOR, 2) * (1 - pow(cos0, 2));
&nbsp;&nbsp;&nbsp;&nbsp;float sin2 = pow(externalIOR / internalIOR, 2) * (1 - pow(cos0, 2));
&nbsp;&nbsp;&nbsp;&nbsp;if ((sin1 &gt; 1) || (sin2 &gt; 1)) return vec3(1); /* Account for TIR. */
&nbsp;&nbsp;&nbsp;&nbsp;float cos1 = sqrt(1 - sin1), cos2 = sqrt(1 - sin2);

&nbsp;&nbsp;&nbsp;&nbsp;/* Calculate the interference phase change. */
&nbsp;&nbsp;&nbsp;&nbsp;vec3 phi = vec3(2 * thinfilmIOR * thickness * cos1);
&nbsp;&nbsp;&nbsp;&nbsp;phi *= 2 * PI / vec3(650, 510, 475);
&nbsp;&nbsp;&nbsp;&nbsp;phi += delta;

&nbsp;&nbsp;&nbsp;&nbsp;/* Obtain the various Fresnel amplitude coefficients. */
&nbsp;&nbsp;&nbsp;&nbsp;float alpha_s = rs(thinfilmIOR, externalIOR, cos1, cos0)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* rs(thinfilmIOR, internalIOR, cos1, cos2);
&nbsp;&nbsp;&nbsp;&nbsp;float alpha_p = rp(thinfilmIOR, externalIOR, cos1, cos0)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* rp(thinfilmIOR, internalIOR, cos1, cos2);
&nbsp;&nbsp;&nbsp;&nbsp;float beta_s&nbsp;&nbsp;= ts(externalIOR, thinfilmIOR, cos0, cos1)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* ts(thinfilmIOR, internalIOR, cos1, cos2);
&nbsp;&nbsp;&nbsp;&nbsp;float beta_p&nbsp;&nbsp;= tp(externalIOR, thinfilmIOR, cos0, cos1)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* tp(thinfilmIOR, internalIOR, cos1, cos2);

&nbsp;&nbsp;&nbsp;&nbsp;/* Calculate the s- and p-polarized intensity transmission coefficient. */
&nbsp;&nbsp;&nbsp;&nbsp;vec3 ts = pow(beta_s, 2) / (pow(alpha_s, 2) - 2 * alpha_s * cos(phi) + 1);
&nbsp;&nbsp;&nbsp;&nbsp;vec3 tp = pow(beta_p, 2) / (pow(alpha_p, 2) - 2 * alpha_p * cos(phi) + 1);

&nbsp;&nbsp;&nbsp;&nbsp;/* Calculate the transmitted power ratio for medium change. */
&nbsp;&nbsp;&nbsp;&nbsp;float beamRatio = (internalIOR * cos2) / (externalIOR * cos0);

&nbsp;&nbsp;&nbsp;&nbsp;/* Calculate the average reflectance. */
&nbsp;&nbsp;&nbsp;&nbsp;return 1 - beamRatio * (ts + tp) * 0.5f;
}

vec3 BRDF(vec3 L, vec3 V, vec3 N, vec3 X, vec3 Y)
{
&nbsp;&nbsp;&nbsp;&nbsp;vec3 H = normalize(L + V);
&nbsp;&nbsp;&nbsp;&nbsp;float val = pow(max(0, dot(N, H)), n);
&nbsp;&nbsp;&nbsp;&nbsp;return vec3(val) * FresnelCoating(dot(V, H));
}

::end shader
</pre><br />It is worth noting that this is a reference implementation meant to be readable, and can be thoroughly optimized. In particular, the Fresnel calculations are the most expensive, but there are numerous ways of reducing the amount of computations. For instance, we can use the reciprocity properties of s-polarized light, and also recycle many intermediate calculations. If you are not interested in perfect physical accuracy, you can also skip the polarization calculations and directly use intensity Fresnel coefficients, though because amplitudes are signed and intensities are not, you will need to calculate the proper sign to use for the cosine term somehow (or just ignore it altogether and have incorrect but plausible thin film interference).<br /><br />If you are really desperate about runtime performance, you can still retain the nice colorful patterns while trading physical accuracy by approximating the final formula however you see fit, the only fundamental requirement is that the <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_1a58a8cd1856d0264f8830dfbcf2f6cf_l3.png" alt="ql_1a58a8cd1856d0264f8830dfbcf2f6cf_l3.p"></span> term be in there somewhere.<br /><br />This is a screenshot of the above BRDF's polar plot at incidence 45 degrees and illustrates its wavelength-dependent nature:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15428-0-20304200-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15428" title="brdf_polar.png - Size: 13.58K, Downloads: 344"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-55538600-1367316241_thumb.png" id='ipb-attach-img-15428-0-20304200-1369539296' style='width:480;height:252' class='attach' width="480" height="252" alt="Attached Image: brdf_polar.png" /></a></p><br />Note how the BRDF differs for the three channels (in fact, every wavelength produces a different response, but we're working in RGB mode here). And here are a few renders (still from BRDF Explorer) with some sensible parameters. Here we assume the internal medium is fully opaque:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15430-0-20340100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15430" title="brdf_explorer_renderings.png - Size: 663.16K, Downloads: 408"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-35384600-1367318194_thumb.png" id='ipb-attach-img-15430-0-20340100-1369539296' style='width:480;height:162' class='attach' width="480" height="162" alt="Attached Image: brdf_explorer_renderings.png" /></a></p><br />What happens when we set the thin film thickness to zero? In this case, the layer physically disappears and the formula degenerates to ordinary Fresnel reflection (more specifically, the Fresnel reflection coefficients for the layer become zero while the transmission coefficients become one).<br /><br />What about making the thin film extremely thick? In that case, we see that the rate of change of the phase <span rel='lightbox'><img class='bbc_img' src="http://quicklatex.com/cache3/ql_ccd0d188db4de4a3f4f89dfd3f3f66be_l3.png" alt="ql_ccd0d188db4de4a3f4f89dfd3f3f66be_l3.p"></span> with respect to view angle becomes arbitrarily large, causing the interference effects to average out to white light, as expected. Also, because we are using a BRDF, we are assuming that light exits the surface at the same point it enters it, which is a very good approximation when the thin film is very small (on the order of light's wavelength). However, as the film becomes thicker, the approximation breaks down, so the film should probably be no larger than a few thousand nanometers, at most.<br /><br />The same holds true for microfacet distributions. Thin films coated over surfaces with very high microfacet roughness coefficients are somewhat unphysical, since a coating naturally tends to be smoother than the surface it is applied on. This should be kept in mind, as the two layer interfaces are assumed to be coplanar.<br /><br />You might also wonder what happens if refractive index depends on wavelength. Well, not much, the correct refractive index and incident/transmitted angles are simply used, and everything else remains the same, since waves of different frequencies do not interfere in any meaningful way. The BRDF above chooses to assume a constant IOR, though, to simplify matters. Also, if the refractive indices are wavelength-dependent, you will also observe dispersion effects in the transmitted light.<br /><br /><h2>Transparency?</h2><br />You may also want to handle transparency if the internal medium is not opaque. You can use whatever method you already have in place to render refractive surfaces, using the final transmitted angle (cos2 in the pseudocode). This is necessary for soap bubbles. Of course, if the internal medium is opaque, this is not necessary as the transmitted light is simply absorbed. It is also possible to use this with subsurface scattering (thus representing a subsurface scattering material with a thin film coating) by using the transmitted light (suitably refracted, as mentioned above).<br /><br />Here is a render of a model with a soap-bubble-like BRDF, rendered with ray tracing. In this case, there is no visible refraction because soap bubbles are simply an air/water/air interface, so the final transmitted angle is the same as the incident angle:<br /><br /><p style='text-align:center'><a class='resized_img' rel='lightbox[7feb0834c102d34069f916cc27af5731]' id='ipb-attach-url-15232-0-19922800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15232" title="interference_dragon.png - Size: 364.3K, Downloads: 779"><img src="http://uploads.gamedev.net/monthly_04_2013/ccs-181063-0-10219400-1366908222_thumb.png" id='ipb-attach-img-15232-0-19922800-1369539296' style='width:480;height:480' class='attach' width="480" height="480" alt="Attached Image: interference_dragon.png" /></a></p><br /><h1>Final notes</h1><br />A good selection of parameters is essential to obtain realistic results. For instance, the film thickness should be on the order of light's wavelength (a few hundred nanometers). As you increase the thickness, interference effects disappear and as the thickness tends to zero, you just get ordinary Fresnel reflection, as mentioned previously. Make sure to use correct refractive indices for your materials. The range of values which can produce interference effects is quite narrow, so the parameters have to be accurate.<br /><br />For metals or materials where the refractive index varies considerably over the visible spectrum, such as copper, three refractive indices (one per RGB channel) should be used for physical accuracy if possible. This requires only minor changes to the BRDF, as everything can be vectorized. It suffices to make the IOR parameters 3-component vectors and vectorize the Fresnel coefficient functions. The computational cost is therefore exactly the same.<br /><br />One last point is that for non-solid thin films, such as oil or water coatings, the thickness of the layer is probably not constant at every point of a given object. As an example, soap bubbles are thicker at the bottom than at the top, due to gravity. For a convincing render, this should be taken into account. As a result, thin film thickness should probably be a vertex attribute rather than a material attribute, or, alternatively, a more general reflectance model should be considered (such as a spatially varying BRDF). Adding some noise to the film thickness can also go very far in improving the appearance of some materials, and it is convenient to implement.<br /><br />Attached is the zipped BRDF Explorer script so that you may play around with it at your leisure.<br /><br /><h1>Article Update Log</h1><br /><strong class='bbc'>2 May 2013</strong>: Fixed a couple of bugs in the shader, corrected a few typos and improved formatting.<br /><strong class='bbc'>30 Apr 2013</strong>: Added some notes about interesting variations to apply to film thickness and on optimizations.<br /><strong class='bbc'>29 Apr 2013</strong>: Added notes on the motivation of neglecting absorption effects.<br /><strong class='bbc'>28 Apr 2013</strong>: Added notes on IOR and physical accuracy of solution.<br /><strong class='bbc'>27 Apr 2013</strong>: Added extra render, improved formatting.<br /><strong class='bbc'>26 Apr 2013</strong>: Added BRDF and some renders.<br /><strong class='bbc'>25 Apr 2013</strong>: Began writing article.]]></description>
		<pubDate>Fri, 08 Mar 2013 03:59:03 +0000</pubDate>
		<guid isPermaLink="false">0164681a5e7cab55bfaa89e9c524c9f0</guid>
	</item>
	<item>
		<title>2D Animation Basics</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/2d-animation-basics-r2952</link>
		<description><![CDATA[<h2>Introduction</h2><p>Simply put, 2D animation is movement and transformation of objects on the screen in two dimensions. A good example of 2D animation is classic cartoons – multiple pictures of Mickey Mouse or Donald Duck alternating over time and producing the effect of moving objects. The modern 2D animation software allows creating animation much easier though, without making oodles of frames.</p><br />
<h2>Frames and Keyframes</h2><p>Just like with classic animation, computer 2D animation also consists of frames. Frame is a single picture of an object illustrating certain position, size and other properties of that object. Adjacent frames are slightly different from each other, so when the animation is played at the full speed – usually 24 frames per second – the effect of motion appears. This is similar to cartoons and movies.</p><br />
<p>But in computer animation you typically don’t have to draw each and every frame manually. Most of 2D animation software allows you to set up keyframes – frames that specify intermediate positions of an object. The motion of the object between these keyframes is built by the software automatically. So you end up with creating keyframes only, while the animation tool calculates and builds the rest of frames automatically. This process is called interpolation.</p><br />
<h2>Curve Animation</h2><p>The motion between two keyframes can be different – accelerating, slowing down, steady or even all of these in one move. The simplest way to specify how exactly one keyframe should translate to another is curves. A curve defines how a specific parameter of an object, such as a coordinate or a scale should vary over time.</p><br />
<p>Let’s take a look at a simple animation example made in one of 2D animation programs – <a href='http://patagames.com/astudio/overview' class='bbc_url' title='External link' rel='nofollow external'>GameDev Animation Studio</a>. We want to make a ball move from up to down and then bounce back.</p><br />
<span rel='lightbox'><img class='bbc_img' src="http://patagames.com/articles/2d-animation-basics/volyball_2.png" alt="volyball_2.png"></span><br />
<p>Basically, this means the vertical coordinate of the object, or simply the Y-coordinate should change from zero (the top of the screen) to the maximum value that corresponds to the bottom of the screen. Then the ball “bounces”, i.e. its form gets distorted in the vertical direction and restored again. After that, the ball moves up, which means the Y-coordinate should decrease back to zero. Also note, that due to gravity, the ball moves down with acceleration, while the backward movement must be decelerating. This is mirrored in the slope of the curve: a sharper slope means faster movement and vise versa.</p><br />
<h2>Position Curve</h2><p>So the curve describing vertical movement of the ball looks as follows:</p><br />
<span rel='lightbox'><img class='bbc_img' src="http://patagames.com/articles/2d-animation-basics/MoveY.png" alt="MoveY.png"></span><br />
<p>As you see, the curve here gradually develops from zero to its maximum, and then, after the peak, it goes back to zero again. The highest value of the curve corresponds to the moment the ball hits the ground – this happens in the 30th frame on the timeline. After that moment, the ball starts moving up.</p><br />
<br />
<h2>Zoom Curve</h2><br />
<p>When the ball is just to hit the ground, it starts deforming. Indeed, the ball is elastic, so it kinda must deform. This animation can be done with another curve – the zoom curve. We’ll change the vertical scale (Y-scale) of the ball to reflect the way it gets distorted during the hit.</p><br />
&lt;div&gt;<span rel='lightbox'><img class='bbc_img' src="http://patagames.com/articles/2d-animation-basics/ZoomY.png" alt="ZoomY.png"></span>&lt;/div&gt;<br />
<p>Above is the Y-scale zoom curve. As you see, prior the 29th frame the curve is at its maximum. This corresponds to the non-deformed ball falling down. In 29th, the curve goes down, and the ball’s image is zoomed down along the vertical axis. Then, after the curve passes the “hit-the-ground” point in 31st frame, the image is zoomed up to normal, so the curve goes up again to its maximum.</p><br />
<p>Note, that the higher the zoom factor is, the more elastic the ball will look. Thus, experimenting with curves in 2D animation tools is an easy way to find the most spectacular and satisfying animation effects. On the other hand, the classic animation approach would require you to discard all the work already done and start from scratch if you decided to change something.</p><br />
<h2>Adding Details to Animation</h2><p>Obviously, the ball just moving up and down doesn’t look too realistic, that is why we need a shadow. We should place a shadow picture beneath the ball and use curves to make it look real. Overall, the process is pretty much similar to above steps. We’ll need to scale the shadow and modify its transparency level in accordance with the ball’s movement.</p><br />
<p><a href='http://www.youtube.com/watch?v=XnKY_Ssfj_k&feature=youtu.be' class='bbc_url' title='External link' rel='nofollow external'>Here</a> is the final result of the jumping ball animation.</p><br />
<h2>Conclusion</h2><p>Making 2D animation is a piece of cake with proper software. Thanks to automated image transformation between keyframes, curve animation and multiple transition effects, creating an animated picture or a movie can be done in several minutes, even by a beginner.</p>]]></description>
		<pubDate>Thu, 13 Dec 2012 05:59:29 +0000</pubDate>
		<guid isPermaLink="false">1eb93307694834407e339c29b71fa727</guid>
	</item>
	<item>
		<title>GPGPU image processing basics using OpenCL.NET</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/gpgpu-image-processing-basics-using-openclnet-r2948</link>
		<description><![CDATA[OpenCL is a cross-platform framework used mostly for GPGPU (General-purpose computing on graphics processing units). There are plenty of tutorials available on image processing with OpenCL using C/C++, however there's not much information that would cover OpenCL image processing with .NET.<br /><br />I won't go into details about OpenCL kernels/queues/etc. (there's plenty of information available on the internet), however I'll provide you with a bare minimum code required to load an image from disk, process it with OpenCL on the GPU and save it back to a file.<br /><br />Before we get started, make sure that you download the source code of OpenCL.NET from <a href='http://openclnet.codeplex.com/' class='bbc_url' title='External link' rel='nofollow external'>http://openclnet.codeplex.com/</a> and add it to your project.<br /><br />We'll use a simple OpenCL kernel that converts an input image into a grayscale image. The kernel should be saved to a separate file.<br /><br />Kernel source code:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
__kernel void imagingTest(__read_only&nbsp;&nbsp;image2d_t srcImg,
					&nbsp;&nbsp; __write_only image2d_t dstImg)
{
&nbsp;&nbsp;const sampler_t smp = CLK_NORMALIZED_COORDS_FALSE | //Natural coordinates
	CLK_ADDRESS_CLAMP_TO_EDGE | //Clamp to zeros
	CLK_FILTER_LINEAR;
&nbsp;&nbsp;int2 coord = (int2)(get_global_id(0), get_global_id(1));

&nbsp;&nbsp;uint4 bgra = read_imageui(srcImg, smp, coord); //The byte order is BGRA

&nbsp;&nbsp;float4 bgrafloat = convert_float4(bgra) / 255.0f; //Convert to normalized [0..1] float

&nbsp;&nbsp;//Convert RGB to luminance (make the image grayscale).
&nbsp;&nbsp;float luminance =&nbsp;&nbsp;sqrt(0.241f * bgrafloat.z * bgrafloat.z + 0.691f * bgrafloat.y * bgrafloat.y + 0.068f * bgrafloat.x * bgrafloat.x);
&nbsp;&nbsp;bgra.x = bgra.y = bgra.z = (uint) (luminance * 255.0f);

&nbsp;&nbsp;bgra.w = 255;

&nbsp;&nbsp;write_imageui(dstImg, coord, bgra);
}
</pre><br /><h2>Namespaces Used</h2>&nbsp;&nbsp;&nbsp;&nbsp;<br /><pre class='prettyprint lang-auto linenums:0'>
using System;
using System.Collections;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Runtime.InteropServices;
using OpenCL.Net;
</pre><br /><h2>Error handling</h2><br />Since OpenCL.NET is a wrapper for C API, we'll have to do all the error checking on our own. I'm using the following two methods:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
private void CheckErr(Cl.ErrorCode err, string name)
{
	if (err != Cl.ErrorCode.Success) {
		Console.WriteLine("ERROR: " + name + " (" + err.ToString() + ")");
	}
}

private void ContextNotify(string errInfo, byte[] data, IntPtr cb, IntPtr userData) {
	Console.WriteLine("OpenCL Notification: " + errInfo);
}
</pre><br /><h2>Setting Up</h2><br />The following two variables should be declared in the class itself and will be shared across all of the methods:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
private Cl.Context _context;
private Cl.Device _device;</pre><br />And this is the method that sets up OpenCL:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
private void Setup ()
{
	Cl.ErrorCode error;
	Cl.Platform[] platforms = Cl.GetPlatformIDs (out error);
	List&lt;Cl.Device&gt; devicesList = new List&lt;Cl.Device&gt; ();
&nbsp;&nbsp;
	CheckErr (error, "Cl.GetPlatformIDs");
&nbsp;&nbsp;
	foreach (Cl.Platform platform in platforms) {
		string platformName = Cl.GetPlatformInfo (platform, Cl.PlatformInfo.Name, out error).ToString ();
		Console.WriteLine ("Platform: " + platformName);
		CheckErr (error, "Cl.GetPlatformInfo");

		//We will be looking only for GPU devices
		foreach (Cl.Device device in Cl.GetDeviceIDs(platform, Cl.DeviceType.Gpu, out error)) {
			CheckErr (error, "Cl.GetDeviceIDs");
			Console.WriteLine ("Device: " + device.ToString ());
			devicesList.Add (device);
		}
	}
&nbsp;&nbsp;
	if (devicesList.Count &lt;= 0) {
		Console.WriteLine ("No devices found.");
		return;
	}
&nbsp;&nbsp;
	_device = devicesList[0];
&nbsp;&nbsp;
	if (Cl.GetDeviceInfo(_device, Cl.DeviceInfo.ImageSupport, out error).CastTo&lt;Cl.Bool&gt;() == Cl.Bool.False)
	{
		Console.WriteLine("No image support.");
		return;
	}

	_context = Cl.CreateContext(null, 1, new[] { _device }, ContextNotify, IntPtr.Zero, out error);	//Second parameter is amount of devices
	CheckErr(error, "Cl.CreateContext");
}
</pre><br /><h2>The Image Processing Part</h2><br />The main problem is that OpenCL.NET is a wrapper around C API of OpenCL, so it can only work with unmanaged memory. However, all of the data in .NET is managed, so we'll have to marshal the data between managed/unmanaged memory. Usually it would be much easier to handle the RGBA color components in float [0..1] space. However, the input image should be in byte[] array, because it would really affect the performance to do the byte=&gt;float conversion on the CPU (we would have to divide each component by 255 for every pixel of the image twice - once before the image processing and once after).<br /><br /><pre class='prettyprint lang-auto linenums:0'>
public void ImagingTest (string inputImagePath, string outputImagePath)
{
	Cl.ErrorCode error;

	//Load and compile kernel source code.
	string programPath = Environment.CurrentDirectory + "/../../ImagingTest.cl";	//The path to the source file may vary
&nbsp;&nbsp;
	if (!System.IO.File.Exists (programPath)) {
		Console.WriteLine ("Program doesn't exist at path " + programPath);
		return;
	}
&nbsp;&nbsp;
	string programSource = System.IO.File.ReadAllText (programPath);
&nbsp;&nbsp;
	using (Cl.Program program = Cl.CreateProgramWithSource(_context, 1, new[] { programSource }, null, out error)) {
		CheckErr(error, "Cl.CreateProgramWithSource");

		//Compile kernel source
		error = Cl.BuildProgram (program, 1, new[] { _device }, string.Empty, null, IntPtr.Zero);
		CheckErr(error, "Cl.BuildProgram");

		//Check for any compilation errors
		if (Cl.GetProgramBuildInfo (program, _device, Cl.ProgramBuildInfo.Status, out error).CastTo&lt;Cl.BuildStatus&gt;()
			!= Cl.BuildStatus.Success) {
			CheckErr(error, "Cl.GetProgramBuildInfo");
			Console.WriteLine("Cl.GetProgramBuildInfo != Success");
			Console.WriteLine(Cl.GetProgramBuildInfo(program, _device, Cl.ProgramBuildInfo.Log, out error));
			return;
		}

		//Create the required kernel (entry function)
		Cl.Kernel kernel = Cl.CreateKernel(program, "imagingTest", out error);
		CheckErr(error, "Cl.CreateKernel");
	&nbsp;&nbsp;
		int intPtrSize = 0;
		intPtrSize = Marshal.SizeOf(typeof(IntPtr));

		//Image's RGBA data converted to an unmanaged[] array
		byte[] inputByteArray;
		//OpenCL memory buffer that will keep our image's byte[] data.
		Cl.Mem inputImage2DBuffer;

		Cl.ImageFormat clImageFormat = new Cl.ImageFormat(Cl.ChannelOrder.RGBA, Cl.ChannelType.Unsigned_Int8);

		int inputImgWidth, inputImgHeight;
	&nbsp;&nbsp;
		int inputImgBytesSize;

		int inputImgStride;

		//Try loading the input image
		using (FileStream imageFileStream = new FileStream(inputImagePath, FileMode.Open) ) {
			System.Drawing.Image inputImage = System.Drawing.Image.FromStream( imageFileStream );
		&nbsp;&nbsp;
			if (inputImage == null) {
				Console.WriteLine("Unable to load input image");
				return;
			}
		&nbsp;&nbsp;
			inputImgWidth = inputImage.Width;
			inputImgHeight = inputImage.Height;
		&nbsp;&nbsp;
			System.Drawing.Bitmap bmpImage = new System.Drawing.Bitmap(inputImage);

			//Get raw pixel data of the bitmap
			//The format should match the format of clImageFormat
			BitmapData bitmapData = bmpImage.LockBits( new Rectangle(0, 0, bmpImage.Width, bmpImage.Height),
													&nbsp;&nbsp;ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);//inputImage.PixelFormat);

			inputImgStride = bitmapData.Stride;
			inputImgBytesSize = bitmapData.Stride * bitmapData.Height;
		&nbsp;&nbsp;
			//Copy the raw bitmap data to an unmanaged byte[] array
			inputByteArray = new byte[inputImgBytesSize];
			Marshal.Copy(bitmapData.Scan0, inputByteArray, 0, inputImgBytesSize);

			//Allocate OpenCL image memory buffer
			inputImage2DBuffer = Cl.CreateImage2D(_context, Cl.MemFlags.CopyHostPtr | Cl.MemFlags.ReadOnly, clImageFormat,
												(IntPtr)bitmapData.Width, (IntPtr)bitmapData.Height,
												(IntPtr)0, inputByteArray, out error);
			CheckErr(error, "Cl.CreateImage2D input");
		}

		//Unmanaged output image's raw RGBA byte[] array
		byte[] outputByteArray = new byte[inputImgBytesSize];

		//Allocate OpenCL image memory buffer
		Cl.Mem outputImage2DBuffer = Cl.CreateImage2D(_context, Cl.MemFlags.CopyHostPtr | Cl.MemFlags.WriteOnly, clImageFormat,
													&nbsp;&nbsp;(IntPtr)inputImgWidth, (IntPtr)inputImgHeight, (IntPtr)0, outputByteArray, out error);
		CheckErr(error, "Cl.CreateImage2D output");

		//Pass the memory buffers to our kernel function
		error = Cl.SetKernelArg(kernel, 0, (IntPtr)intPtrSize, inputImage2DBuffer);
		error |= Cl.SetKernelArg(kernel, 1, (IntPtr)intPtrSize, outputImage2DBuffer);
		CheckErr(error, "Cl.SetKernelArg");
	&nbsp;&nbsp;
		//Create a command queue, where all of the commands for execution will be added
		Cl.CommandQueue cmdQueue = Cl.CreateCommandQueue(_context, _device, (Cl.CommandQueueProperties)0, out error);
		CheckErr(error, "Cl.CreateCommandQueue");

		Cl.Event clevent;

		//Copy input image from the host to the GPU.
		IntPtr[] originPtr = new IntPtr[] { (IntPtr)0, (IntPtr)0, (IntPtr)0 };	//x, y, z
		IntPtr[] regionPtr = new IntPtr[] { (IntPtr)inputImgWidth, (IntPtr)inputImgHeight, (IntPtr)1 };	//x, y, z
		IntPtr[] workGroupSizePtr = new IntPtr[] { (IntPtr)inputImgWidth, (IntPtr)inputImgHeight, (IntPtr)1 };
		error = Cl.EnqueueWriteImage(cmdQueue, inputImage2DBuffer, Cl.Bool.True, originPtr, regionPtr, (IntPtr)0, (IntPtr)0, inputByteArray, 0, null, out clevent);
		CheckErr(error, "Cl.EnqueueWriteImage");

		//Execute our kernel (OpenCL code)
		error = Cl.EnqueueNDRangeKernel(cmdQueue, kernel, 2, null, workGroupSizePtr, null, 0, null, out clevent);
		CheckErr(error, "Cl.EnqueueNDRangeKernel");

		//Wait for completion of all calculations on the GPU.
		error = Cl.Finish(cmdQueue);
		CheckErr(error, "Cl.Finish");

		//Read the processed image from GPU to raw RGBA data byte[] array
		error = Cl.EnqueueReadImage(cmdQueue, outputImage2DBuffer, Cl.Bool.True, originPtr, regionPtr,
									(IntPtr)0, (IntPtr)0, outputByteArray, 0, null, out clevent);
		CheckErr(error, "Cl.clEnqueueReadImage");

		//Clean up memory
		Cl.ReleaseKernel(kernel);
		Cl.ReleaseCommandQueue(cmdQueue);
	&nbsp;&nbsp;
		Cl.ReleaseMemObject(inputImage2DBuffer);
		Cl.ReleaseMemObject(outputImage2DBuffer);

		//Get a pointer to our unmanaged output byte[] array
		GCHandle pinnedOutputArray = GCHandle.Alloc(outputByteArray, GCHandleType.Pinned);
		IntPtr outputBmpPointer = pinnedOutputArray.AddrOfPinnedObject();

		//Create a new bitmap with processed data and save it to a file.
		Bitmap outputBitmap = new Bitmap(inputImgWidth, inputImgHeight, inputImgStride, PixelFormat.Format32bppArgb, outputBmpPointer);
	&nbsp;&nbsp;
		outputBitmap.Save(outputImagePath, System.Drawing.Imaging.ImageFormat.Png);

		pinnedOutputArray.Free();
	}
}
</pre><br />Now you should have a good foundation for more complex image processing effects on the GPU.]]></description>
		<pubDate>Thu, 29 Nov 2012 10:09:42 +0000</pubDate>
		<guid isPermaLink="false">24322cab50d699df38e74ee891f86f77</guid>
	</item>
	<item>
		<title>Dynamic Resolution Rendering</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/dynamic-resolution-rendering-r2821</link>
		<description><![CDATA[Brought to you by the <span style='color: blue'><a href='http://software.intel.com/en-us/visual-computing/?cid=sw:graphics292%5D' class='bbc_url' title='External link' rel='nofollow external'>Intel® Visual Computing Developer Community</a></span> | <a href='http://software.intel.com/en-us/articles/dynamic-resolution-rendering/?cid=sw:graphics295%5D' class='bbc_url' title='External link' rel='nofollow external'>Download the source code</a> | <a href='http://software.intel.com/en-us/videos/the-dynamic-resolution-rendering-sample/?cid=sw:graphics299%5D' class='bbc_url' title='External link' rel='nofollow external'>Watch the video</a><br /><br /><br /><span style='font-size: 18px;'><strong class='bbc'>Introduction</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;The resolution selection screen has been one of the defining aspects of PC gaming since the birth of 3D games. In this whitepaper and the accompanying sample code, we argue that this no longer needs to be the case; developers can dynamically vary the resolution of their rendering instead of having a static resolution selection.<br /><br />Dynamic resolution rendering involves adjusting the resolution to which you render the 3D scene by constraining the rendering to a portion of a render target using a viewport, and then scaling this to the output back buffer. Graphical user interface components can then be rendered at the back buffer resolution, as these are typically less expensive elements to draw. The end result is that stable high frame rates can be achieved with high quality GUIs.<br /><br />We'll be presenting performance results and screenshots in this article taken on a pre-release mobile 2nd generation Intel® Core™ i7 processor (Intel® microarchitecture code name Sandy Bridge, D1 stepping quad core 2.4 GHz CPU with 4GB DDR3 1333MHz RAM) with Intel® HD Graphics 3000.<br /><br />This article and the accompanying sample were originally presented at the Game Developers Conference (GDC) in San Francisco 2011, and a video of the presentation can be found on GDC Vault [GDC Vault 2011], with the slides for that presentation available on the Intel website [Intel GDC 2011]. Since the presentation, the author has discovered that several game companies already use this technique on consoles; Dmitry Andreev from LucasArts' presentation on Anti-Aliasing is the only public source, though with few details on the dynamic resolution technique used [Andreev 2011].</span><br /><br />&nbsp;&nbsp;<br /><p class='bbc_center'><span style='font-size: 10px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5596-0-23612100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5596" title="Figure1.jpg - Size: 50.51K, Downloads: 2991"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-82919300-1317065803_thumb.jpg" id='ipb-attach-img-5596-0-23612100-1369539296' style='width:250;height:142' class='attach' width="250" height="142" alt="Attached Image: Figure1.jpg" /></a></span>&nbsp;&nbsp;</p><p class='bbc_center'><strong class='bbc'><span style='font-size: 10px;'>Figure 1:</span></strong><span style='font-size: 10px;'> <em class='bbc'>The sample scene viewed from one of the static camera viewpoints.</em></span></p>&nbsp;&nbsp;<br />&nbsp;&nbsp;<span style='font-size: 18px;'><strong class='bbc'>Motivation</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;Games have almost always had a strong performance variation with resolution, and the increase in shader complexity along with post-processing techniques has continued the trend of per-pixel costs dominating modern games. Increasing resolution also increases texture sampling and render target bandwidth. Setting the resolution appropriately for the performance of the system is therefore critical. Being able to vary the resolution dynamically gives the developer an additional performance control option which can enable the game to maintain a stable and appropriate frame rate, thus improving the overall quality of the experience.<br /><br />Rendering the graphical user interface at the native screen resolution can be particularly important for role playing, real time strategy, and massively multiplayer games. Suddenly, even on low-end systems, the player can indulge in complex chat messaging whilst keeping an eye on their teammates' stats.<br /><br />Finally, with the increasing dominance of laptops in PC gaming, power consumption is beginning to become relevant to game development. Performance settings can cause a reduction in CPU and GPU frequency when a machine goes from mains to battery power, and with dynamic resolution rendering, the game can automatically adjust the resolution to compensate. Some games may want to give the user the option of a low power profile to further reduce power consumption and enable longer gaming on the go. Experiments with the sample have found that cutting the resolution to 0.5x reduces the power consumption of the processor package to 0.7x normal when vertical sync is enabled so that the frame rate is maintained.</span><br /><br /><br /><span style='font-size: 18px;'><strong class='bbc'>Basic Principles</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;The basic principle of dynamic resolution rendering is to use a viewport to constrain the rendering to a portion of an off-screen render target, and then to scale the view. For example, the render target might be of size (1920, 1080), but the viewport could have an origin of (0, 0) and size (1280, 720).<br />&nbsp;&nbsp;</span><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5597-0-23626400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5597" title="Figure2.jpg - Size: 11.93K, Downloads: 2847"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-10568400-1317065804_thumb.jpg" id='ipb-attach-img-5597-0-23626400-1369539296' style='width:250;height:65' class='attach' width="250" height="65" alt="Attached Image: Figure2.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 2:</strong> <em class='bbc'>using a viewport to constrain rendering</em></span></p><br /><span style='font-size: 12px;'>By creating render targets larger than the back buffer, the dynamic resolution can be varied from subsampled to supersampled. Care needs to be taken to ensure the full set of required render targets and textures fit within graphics memory, but systems based on Intel® microarchitecture code name Sandy Bridge processor graphics usually have considerable memory, as they use system memory.<br /><br />&nbsp;&nbsp;</span><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5598-0-23640400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5598" title="Figure3.jpg - Size: 45.18K, Downloads: 1355"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-42484300-1317065804_thumb.jpg" id='ipb-attach-img-5598-0-23640400-1369539296' style='width:250;height:186' class='attach' width="250" height="186" alt="Attached Image: Figure3.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 3:</strong> <em class='bbc'>dynamic resolution can be varied from subsampling to supersampling</em></span></p><span style='font-size: 12px;'>&nbsp;&nbsp; When undertaking normal rendering to the dynamic viewport, there are no changes that need to be made-the rasterization rules ensure this is handled. However, when reading from the render target, care needs to be taken to scale the coordinates appropriately and handle clamping at the right and bottom edges.<br /><br />The following example pixel shader code shows how to clamp UVs. This is mainly used when doing dependent reads (i.e., when there are per-pixel operations on a UV, which is subsequently used to sample from a dynamic render target).</span><br /><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5592-0-23552300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5592" title="CodeShot1.jpg - Size: 14.05K, Downloads: 2430"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-68230300-1317065802_thumb.jpg" id='ipb-attach-img-5592-0-23552300-1369539296' style='width:250;height:34' class='attach' width="250" height="34" alt="Attached Image: CodeShot1.jpg" /></a></span></p><span style='font-size: 12px;'>&nbsp;&nbsp;In the case of motion blur-a common post-process operation that uses dependent reads from a render target-the extra math required has little effect on the performance, as the shader is texture-fetch bound.<br />&nbsp;&nbsp;</span><br /><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5599-0-23654600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5599" title="Figure4.jpg - Size: 34.68K, Downloads: 2633"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-73131000-1317065804_thumb.jpg" id='ipb-attach-img-5599-0-23654600-1369539296' style='width:250;height:138' class='attach' width="250" height="138" alt="Attached Image: Figure4.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 4:</strong> <em class='bbc'>Color leak on edges of screen due to motion blur, which can be solved by using clamping</em></span></p><br /><span style='font-size: 12px;'>In addition to clamping, it's also important to ensure that the resolution ratios used in shaders is representative of the actual viewport ratio, rather than just your application's desired ratio. This is easily obtained by recalculating the ratio from the dynamic viewport dimensions. For example, in the sample code function DynamicResolution::SetScale, the following is performed after ensuring the scale meets boundary criteria:</span><br /><br /><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5593-0-23567400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5593" title="CodeShot2.jpg - Size: 26.64K, Downloads: 2305"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-03264100-1317065803_thumb.jpg" id='ipb-attach-img-5593-0-23567400-1369539296' style='width:250;height:58' class='attach' width="250" height="58" alt="Attached Image: CodeShot2.jpg" /></a></span></p><span style='font-size: 12px;'>&nbsp;&nbsp;<strong class='bbc'>Scaling Filters</strong><br />After rendering the 3D scene, the viewport area needs to be scaled to the back buffer resolution. A variety of filters can be used to perform this, and the sample implements several examples as described here.<br /><br /><strong class='bbc'>Point Filtering</strong><br />Point filtering is a fast basic filter option. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~0.4ms.<br /><br /><strong class='bbc'>Bilinear Filtering</strong><br />Bilinear filtering is almost as fast as point filtering due to hardware support, and it reduces the aliasing artifacts from edges by smoothing, but also blurs the scene. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~0.4ms.<br /><br /><strong class='bbc'>Bicubic Filtering</strong><br />Bicubic filtering is only noticeably better than bilinear for resolutions of 0.5x the back buffer, and its performance is 7x slower even using a fast bicubic filter [Sigg 2005]. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~3.5ms.<br /><br /><strong class='bbc'>Noise Filtering</strong><br />Adding some noise to point filtering helps to add high frequencies, which break the aliasing slightly at a low cost. The implementation in the sample is fairly basic, and improved film grain filtering might artistically fit your rendering. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~0.5ms.<br /><br />&nbsp;&nbsp;<strong class='bbc'>Noise Offset Filtering</strong><br />Adding a small random offset to the sampling location during scaling reduces the regularity of aliased edges. This approach is common in fast filtering of shadow maps. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~0.7ms.</span><br /><br /><strong class='bbc'>Temporal Anti-aliasing Filtering</strong><br />This scaling filter requires extra support during the initial rendering path to render odd and even frames offset by half a pixel in X and Y. When filtered intelligently to remove ghosting artifacts, the resulting image quality is substantially improved by sampling from twice as many pixels. This filtering method is described in greater depth in its own section below. Scaling from a 0.71x ratio dynamic viewport to 1280x720 takes ~1.1ms, and has almost the same quality as rendering to full resolution.<br /><br /><strong class='bbc'>Temporal Anti-aliasing Details</strong><br />Temporal anti-aliasing has been around for some time; however, ghosting problems due to differences in the positions of objects in consecutive frames have limited its use. Modern rendering techniques are finally making it an attractive option due to its low performance overhead.<br /><br />The basic approach is to render odd and even frames jittered (offset) by half a pixel in both X and Y. The sample code does this by translating the projection matrix. The final scaling then combines both the current and previous frames, offsetting them by the inverse of the amount they were jittered. The final image is thus made from twice the number of pixels arranged in a pattern similar to the dots of the five side on a die, frequently termed a quincunx pattern.<br /><br /><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5600-0-23668900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5600" title="Figure5.jpg - Size: 24.5K, Downloads: 1028"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-04233100-1317065805_thumb.jpg" id='ipb-attach-img-5600-0-23668900-1369539296' style='width:227;height:200' class='attach' width="227" height="200" alt="Attached Image: Figure5.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 5:</strong> <em class='bbc'>Temporal Anti-Aliasing basic principle</em></span></p><span style='font-size: 12px;'>&nbsp;&nbsp;Used along with dynamic resolution, this approach gives an increased observed number of pixels in the scene when the dynamic resolution is lower than the back buffer, improving the detail in the scene. When the dynamic resolution is equal or higher to the back buffer, the result is a form of anti-aliasing.<br />&nbsp;&nbsp;</span><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5601-0-23684700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5601" title="Figure6.jpg - Size: 8.61K, Downloads: 539"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-27923500-1317065805_thumb.jpg" id='ipb-attach-img-5601-0-23684700-1369539296' style='width:250;height:192' class='attach' width="250" height="192" alt="Attached Image: Figure6.jpg" /></a></span></p><br /><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 6:</strong> <em class='bbc'>Result of Temporal AA when dynamic resolution is lower than that of the back buffer</em></span></p><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5602-0-23699700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5602" title="Figure7.jpg - Size: 10.11K, Downloads: 561"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-64586700-1317065812_thumb.jpg" id='ipb-attach-img-5602-0-23699700-1369539296' style='width:250;height:169' class='attach' width="250" height="169" alt="Attached Image: Figure7.jpg" /></a></span></p><br /><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 7:</strong> <em class='bbc'>Result of Temporal AA when dynamic resolution is equal or higher to that of the back buffer</em></span></p><br /><span style='font-size: 12px;'>&nbsp;&nbsp;<br />In order to get increased texture resolution, a MIP LOD bias needs to be applied to textures. In Microsoft Direct3D* 11, use a D3D11_SAMPLER_DESC MipLODBias of -0.5f during the 3D scene pass. Additionally, the sampler used during scaling needs to use bilinear minification filtering, for example: D3D11_FILTER_MIN_LINEAR_MAG_MIP_POINT.<br /><br />In order to reduce ghosting, we use the velocity buffer written out for motion blur. Importantly, this buffer contains the velocity for each pixel in screen space, thus accounting for camera movement. A scale factor is calculated from both the current and previous frame's velocity and applied to the previous frame's colour to determine its contribution to the final image. This scales the contribution based on how similar the sample location is in real space in both frames.<br />&nbsp;&nbsp;</span><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5594-0-23583500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5594" title="EquationImage1.jpg - Size: 21.75K, Downloads: 1486"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-30782500-1317065803_thumb.jpg" id='ipb-attach-img-5594-0-23583500-1369539296' style='width:250;height:159' class='attach' width="250" height="159" alt="Attached Image: EquationImage1.jpg" /></a></span></p><br /><span style='font-size: 12px;'>The sample has K tuned to give what the author considers to be the best results for a real time application, with no ghosting observed at realistically playable frame rates. Screenshots do expose a small amount of ghosting in high contrast areas as in the screenshot below, which can be tuned out if desired.<br /><br />For games, transparencies present a particular problem in not always rendering out velocity information. In this case, the alpha channel could be used during the forwards rendering of the transparencies to store a value used to scale the contributions in much the same way as the velocity is currently used.<br /><br />An alternative to this approach for ghosting removal is to use the screen space velocity to sample from the previous frame at the location where the current pixel was. This is the technique used in the CryENGINE* 3, first demonstrated in the game Crysis* 2 [Crytek 2010]. Intriguingly, LucasArts' Dmitry Andreev considered using temporal anti-aliasing, but did not due to the use of dynamic resolution in their engine [Andreev 2011]. The author believes these are compatible, as demonstrated in the sample code.<br />&nbsp;&nbsp;</span><br /><br /><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5603-0-23714000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5603" title="Figure8.jpg - Size: 63.56K, Downloads: 1986"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-91395100-1317065812_thumb.jpg" id='ipb-attach-img-5603-0-23714000-1369539296' style='width:250;height:143' class='attach' width="250" height="143" alt="Attached Image: Figure8.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 8:</strong> <em class='bbc'>Temporal Anti-Aliasing with velocity scaling and moving objects</em></span></p>&nbsp;&nbsp;<br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>The Effect of Motion Blur</strong></span><br /><br /><span style='font-size: 12px;'>Motion blur smears pixels and reduces observed aliasing effectively, hence a lower resolution can be used when the camera is moving. However, the sample does not exploit this in its resolution control algorithm. The following screenshots show how reducing the resolution to 0.71x the back buffer results in higher performance, but roughly the same image. Combined with varying motion blur sample rates, this could be a way to reduce artifacts from undersampling with large camera motions whilst maintaining a consistent performance.<br />&nbsp;&nbsp;</span><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5604-0-23728600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5604" title="Figure9.jpg - Size: 49.45K, Downloads: 2015"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-21123800-1317065813_thumb.jpg" id='ipb-attach-img-5604-0-23728600-1369539296' style='width:250;height:143' class='attach' width="250" height="143" alt="Attached Image: Figure9.jpg" /></a></span></p><br /><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 9:</strong> <em class='bbc'>Motion blur with dynamic resolution off</em></span></p><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5605-0-23743100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5605" title="Figure10.jpg - Size: 49.3K, Downloads: 1864"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-52235600-1317065813_thumb.jpg" id='ipb-attach-img-5605-0-23743100-1369539296' style='width:250;height:143' class='attach' width="250" height="143" alt="Attached Image: Figure10.jpg" /></a></span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 10:</strong> <em class='bbc'>Motion blur with dynamic resolution on at 0.71x resolution. Note the decreased frame time yet similar quality end result</em></span></p><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Supersampling</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;Supersampling is a simple technique where the render target used to render the scene is larger than the back buffer. This technique is largely ignored by the current real-time rendering community-multi sampled anti-aliasing and other anti-aliasing techniques have replaced its usage due to their better memory consumption and performance.<br /><br />Using dynamic resolution significantly reduces the performance impact of adding supersampling, as the actual resolution used can be dynamically adjusted. There is a small performance impact to enabling supersampling, mainly due to the extra cost of clearing the larger buffers. The sample code implements a 2x resolution render target when supersampling is enabled, but good quality results are observed for relatively small increases in resolution over the back buffer resolution, so a smaller render target could be used if memory were at a premium. Memory is less of an issue on processor graphics platforms, as the GPU has access to a relatively large proportion of the system memory, all of which is accessible at full performance.<br /><br />Once dynamic resolution rendering methods are integrated, using supersampling is trivial. We encourage developers to consider this, since it can be beneficial for smaller screen sizes and future hardware which could have sufficient performance to run the game at more than its maximum quality.</span><br /><br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Render Target Clearing</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;Since dynamic resolution rendering does not always use the entire render targets surface, it can be beneficial to clear only the required portion. The sample implements a pixel shader clear, and on the Intel® HD Graphics 3000-based system tested, the performance of a pixel shader clear was greater than that of a standard clear when the dynamic ratio was less than 0.71x for a 1280x720 back buffer. In many cases, it may not be necessary to clear the render targets, as these get overwritten fully every frame.<br /><br />Depth buffers should still be cleared completely with the standard clear methods, since these may implement hierarchical depth. Some multi-sampled render targets may also use compression, so should be cleared normally.</span><br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Performance Scaling</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;The sample code scales well with resolution, despite the heavy vertex processing load due to the large highly detailed scene with no level of detail and only very simple culling performed. This gives the chosen control method significant leverage to maintain frame rate at the desired level.<br /><br />Most games use level-of-detail mechanisms to control the vertex load. If these are linked to the approximate size of the object in pixels, the resulting performance scaling will be greater.<br /><br />&nbsp;&nbsp;</span><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5606-0-23758300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5606" title="Figure11.jpg - Size: 50.55K, Downloads: 1746"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-95079200-1317065813_thumb.jpg" id='ipb-attach-img-5606-0-23758300-1369539296' style='width:250;height:148' class='attach' width="250" height="148" alt="Attached Image: Figure11.jpg" /></a>&nbsp;&nbsp;</span></p><p class='bbc_center'><span style='font-size: 12px;'><strong class='bbc'>Figure 11:</strong> <em class='bbc'>Dynamic Resolution Performance at 1280x720</em></span></p><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Resolution Control</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;The sample implements a resolution control method in addition to allowing manual control. The code is in the file DynamicResolutionRendering.cpp, in the function ControlResolution. The desired performance can be selected between the refresh rate (usually 60Hz or 60FPS) and half the refresh rate (usually 30FPS).<br /><br />The control scheme is basic: a resolution scale delta is calculated proportionally to the dimensionless difference in the desired frame time and the current frame time.<br /><br />&nbsp;&nbsp;</span><br /><br /><p class='bbc_center'><span style='font-size: 12px;'><a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5595-0-23598000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5595" title="EquationImage2.jpg - Size: 3.03K, Downloads: 1118"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-51704100-1317065803_thumb.jpg" id='ipb-attach-img-5595-0-23598000-1369539296' style='width:157;height:94' class='attach' width="157" height="94" alt="Attached Image: EquationImage2.jpg" /></a></span></p><span style='font-size: 12px;'>&nbsp;&nbsp;Where <em class='bbc'>S'</em> is the new resolution scale ratio, <em class='bbc'>S</em> is the current resolution scale ratio, <a class='resized_img' rel='lightbox[25409218936792bd91a992a6a8655971]' id='ipb-attach-url-5607-0-23773100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=5607" title="delta.jpg - Size: 1K, Downloads: 414"><img src="http://uploads.gamedev.net/monthly_09_2011/ccs-8549-0-49400100-1317066708_thumb.jpg" id='ipb-attach-img-5607-0-23773100-1369539296' style='width:24;height:19' class='attach' width="24" height="19" alt="Attached Image: delta.jpg" /></a> is the scale delta, <em class='bbc'>k</em> a rate of change constant, <em class='bbc'>T</em> the desired frame time, and <em class='bbc'>t</em> the current frame time.<br /><br />The current frame time uses an average of the GPU inner frame time excluding the present calculated using Microsoft DirectX* queries, and the frame time calculated from the interval between frames in the normal way. The GPU inner frame time is required when vertical sync is enabled, as in this situation the frame time is capped to the sync rate, yet we need to know if the actual rendering time is shorter than that. Averaging with the actual frame rate helps to take into account the present along with some CPU frame workloads. If the actual frame time is significantly larger than the GPU inner frame time, this is ignored, as these are usually due to CPU side spikes such as going from windowed to fullscreen.</span><br /><br /><span style='font-size: 18px;'><strong class='bbc'>Potential Improvements</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;The following list is by no means complete, but merely some of the features which the author believes would naturally extend the current work:</span><ul class='bbc'><li><span style='font-size: 12px;'>Combine the dynamic resolution scene rendering with a	&nbsp;&nbsp;similar method for shadow maps.</span></li><li><span style='font-size: 12px;'>Use this technique with a separate control mechanism	&nbsp;&nbsp;for particle systems, allowing enhanced quality when only a few small	&nbsp;&nbsp;particles are being rendered and improved performance when the fill rate	&nbsp;&nbsp;increases.</span></li><li><span style='font-size: 12px;'>The technique is compatible with other anti-aliasing	&nbsp;&nbsp;techniques that can also be applied along with temporal anti-aliasing.</span></li><li><span style='font-size: 12px;'>Temporal anti-aliasing can use an improved weighted sum	&nbsp;&nbsp;dependent on the distance to the pixel center of the current and previous	&nbsp;&nbsp;frames, rather than just a summed blend. A velocity-dependent offset read,	&nbsp;&nbsp;such as that used in the CryENGINE* 3 [Crytek 2010], could also be used.</span></li><li><span style='font-size: 12px;'>Some games may benefit from running higher quality	&nbsp;&nbsp;anti-aliasing techniques over a smaller area of the image, such as for the	&nbsp;&nbsp;main character or on RTS units highlighted by the mouse.</span></li></ul><span style='font-size: 18px;'><strong class='bbc'>Conclusion</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;Dynamic resolution rendering gives developers the tools needed to improve overall quality with minimal user intervention, especially when combined with temporal anti-aliasing. Given the large range of performance in the PC GPU market, we encourage developers to use this technique as one of their methods of achieving the desired frame rate for their game.</span><br /><br /><br /><br /><span style='font-size: 18px;'><strong class='bbc'>References</strong></span><br /><br /><span style='font-size: 12px;'>&nbsp;&nbsp;[Sigg 2005] Christian Sigg, Martin Hadwiger, "Fast Third Order Filtering", GPU Gems 2. Addison-Wesley, 2005.<br /><br />[Crytek 2010] HPG 2010 "Future graphics in games", Cevat Yerli & Anton Kaplanyan. <a href='http://www.crytek.com/cryengine/presentations' class='bbc_url' title='External link' rel='nofollow external'>http://www.crytek.co...e/presentations</a><br /><br />[GDC Vault 2011] <a href='http://www.gdcvault.com/play/1014646/-SPONSORED-Dynamic-Resolution-Rendering' class='bbc_url' title='External link' rel='nofollow external'>http://www.gdcvault....ution-Rendering</a><br /><br />[Intel GDC 2011] <a href='http://software.intel.com/en-us/articles/intelgdc2011/' class='bbc_url' title='External link' rel='nofollow external'>http://software.inte...s/intelgdc2011/</a><br /><br />[Andreev 2011] <a href='http://www.gdcvault.com/play/1014550/Anti-aliasing-from-a-Different' class='bbc_url' title='External link' rel='nofollow external'>http://www.gdcvault....rom-a-Different</a> [PPT 4.6MB]</span>]]></description>
		<pubDate>Mon, 26 Sep 2011 19:35:42 +0000</pubDate>
		<guid isPermaLink="false">c05c903e3d997added79518f0e850026</guid>
	</item>
	<item>
		<title>Accelerating iray with the NVIDIA Quadro 5000</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/accelerating-iray-with-the-nvidia-quadro-5000-r2814</link>
		<description><![CDATA[<span style='font-size: 18px;'><strong class='bbc'>Introduction </strong>&nbsp;&nbsp;</span><br /><br />First of all, I'm a software guy. I use software, I write about software and I sometimes even dream about software, but I'm also not stupid. So, when NVIDIA contacts me and says we've got a graphics board that makes your software a lot faster and we'd like you to try it out, my answer is a quick ‘yes.’ <br /><br />&nbsp;&nbsp;The board they are referring to is the NVIDIA Quadro 5000. This is a professional level board with 2.5 GB of memory, and enabled with CUDA. It is this technology that is responsible for speeding up software including 3ds Max, Maya and many of the products in Adobe's Creative Suite, especially the Mercury Playback Engine contained in Premiere Pro. <br /><br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>The Installation </strong></span><br /><br />&nbsp;&nbsp;As a digital designer, it is my job to make images and animations look great using the amazing software at my disposal. This is made possible using the large black box that sits faithfully at my side. This black box doesn't get my attention at all unless it occasionally starts whirring when a fan turns on or as a place to support my important papers. Needless to say, I'd be more interested in alphabetizing my CD collection than cracking open my computer to install a new graphics card. But the promise of not only faster renders, but much faster renders, makes it worth the hassle.&nbsp;&nbsp;<br /><br />&nbsp;&nbsp;The NVIDIA Quadro 5000 graphics card, shown in Figure 1, is not for the faint of heart. It is a sizable card taking up two PCI slots and the entire card is encased like a prototype car coming out of Detroit that screams, "serious stuff is found under here." I was able to get the card to fit, albeit tightly, in my standard Dell Studio XPS box, but I had to remove the hard drive in order to get the monster placed. <br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[2ce37bca8660eafd6911f0ea0682913a]' id='ipb-attach-url-4941-0-24927600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4941" title="Figure 1 - Quadro unit.jpg - Size: 586.7K, Downloads: 855"><img src="http://uploads.gamedev.net/monthly_08_2011/ccs-8549-0-91359500-1313436986_thumb.jpg" id='ipb-attach-img-4941-0-24927600-1369539296' style='width:250;height:199' class='attach' width="250" height="199" alt="Attached Image: Figure 1 - Quadro unit.jpg" /></a><br />Figure 1: The NVIDIA Quadro 5000 graphics board is a professional level piece of hardware and overkill for gamers. <br /></p>&nbsp;&nbsp;Once the graphics board was installed, my machine booted right up without any trouble and after a quick download of the software drivers from NVIDIA's web site, I found my machine was working just as before the installation ordeal. With business as usual set in, I started about my work keeping in the back of my mind the fact that I now had a monster GPU ready to show its stuff. <br />&nbsp;&nbsp; <br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>First Impressions </strong></span><br /><br />&nbsp;&nbsp;Having the killer graphics board installed and working without messing up my system relieved my initial hardware fears, so my next thought was, "let's see what this thing can do." I immediately opened the latest version of 3ds Max and loaded a recent NVIDIA iray scene that I created. <br /><br />&nbsp;&nbsp;Before installing the Quadro 5000 card, I rendered this sample scene by simply setting iray to do an Unlimited render before retiring for the evening. I remember checking on the scene after an hour or so and it looked quite good, but I could still see a few artifacts. <br /><br />&nbsp;&nbsp;I rendered the same sample scene, shown in Figure 2, after installing the Quadro 5000 and I could tell it was cranking on the scene because I noticed a delightfully pleasant humming coming from my computer internals. I also noticed that the same quality of the final render was achieved while I watched in a matter of minutes instead of hours like before using the Quadro card. While I realize this isn't the typical quantitative benchmark that gives precise percentages, such as 10X faster; for me it means that I can render a scene without having to leave the computer. <br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[2ce37bca8660eafd6911f0ea0682913a]' id='ipb-attach-url-4942-0-24940400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4942" title="Figure 2 - Test iray scene.jpg - Size: 53.52K, Downloads: 1250"><img src="http://uploads.gamedev.net/monthly_08_2011/ccs-8549-0-83402900-1313436987_thumb.jpg" id='ipb-attach-img-4942-0-24940400-1369539296' style='width:250;height:188' class='attach' width="250" height="188" alt="Attached Image: Figure 2 - Test iray scene.jpg" /></a><br />Figure 2: This sample NVIDIA iray scene rendered to final quality in minutes rather than hours using the NVIDIA Quadro 5000.<br /></p>&nbsp;&nbsp; <br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Tackling a Monster Project </strong></span><br /><br />&nbsp;&nbsp;After testing out a relatively simple project, I decided to throw something with a little more weight at the new board. For my second render, I loaded a kitchen scene with lots of shiny, reflective surfaces made of stainless steel, porcelain and glass with lots of lights bouncing all over the place. The scene is actually an animation rendered in high-dev. For this test, I decided to set iray to do 100 passes per frame to see what kind of quality I'd get in a reasonable render time. <br /><br />&nbsp;&nbsp;The most complex frames of the animation took just over 7 minutes to complete the full 100 frames and the quality was a fair approximation of the ambient occlusion and all reflective surfaces, but the overall image was still grainy. After re-rendering the same frame set to 500 passes, the image was fairly close to photo-realistic with only a little graininess in the deep corners, and it took 44 minutes to render. Still a huge improvement from the earlier project render time, which took a over 24 hours for a 15 second animation. <br /><br />&nbsp;&nbsp;After several more test renders of my personal work, I decided to try out some of the scenes that NVIDIA provided me with, including a detailed Bugatti Veyron sports car, shown in Figure 3. The iray renderer for this car was set to do 500 passes at a resolution of 1300 by 900. The resulting image, shown in Figure 3, took just over 5 minutes. It also provided an excuse for me to include such a gorgeous image in this article. <br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[2ce37bca8660eafd6911f0ea0682913a]' id='ipb-attach-url-4943-0-24952200-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4943" title="Figure 3 - Bugatti_Veyron_iRay02.jpg - Size: 107.4K, Downloads: 1418"><img src="http://uploads.gamedev.net/monthly_08_2011/ccs-8549-0-30439400-1313436988_thumb.jpg" id='ipb-attach-img-4943-0-24952200-1369539296' style='width:250;height:174' class='attach' width="250" height="174" alt="Attached Image: Figure 3 - Bugatti_Veyron_iRay02.jpg" /></a><br />Figure 3: The photo-realistic image quality of NVIDIA iray with the Quadro 5000 graphics card is amazing for taking only 5 minutes of render time.<br /></p><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Understanding NVIDIA iray </strong></span><br /><br />&nbsp;&nbsp;The speed and power behind using the Quadro board to render iray scenes is based on the fact that the Quadro board has been optimized specifically to work with iray by offloading much of the rendering process to the board's GPU. But, first we need to understand what iray is. <br /><br />&nbsp;&nbsp;When 3ds Max was created, it was purposefully made from an open architecture that allowed specific modules to be replaced by other 3rd party plug-ins. This allowed a young company, known as mental images, to do their magic by replacing the default rendering engine used by 3ds Max with their own rendering engine called mental ray (spelled without capital letters, just like the company, which was acquired by NVIDIA in 2008). Mental ray used many advanced rendering techniques to improve the render quality dramatically, especially for realistic scenes with lots of reflections. Over time, mental ray has pushed the limits of rendering quality and speed and is available for many different rendering packages and platforms. <br /><br />&nbsp;&nbsp;The one problem with mental ray is that with all its advanced settings, it can take a long time to figure out which settings give the best quality in the shortest amount of time. A problem that was exacerbated as the product became more and more complex. Mental images solution was to develop iray. <br /><br />&nbsp;&nbsp;NVIDIA iray is frightfully simple. It doesn't include all the advanced settings found in mental ray, but lets you set the render time or the number of passes and it automatically produces the best possible image given those constraints. It also has an unlimited option that will continually refine the render until it is stopped by the user. The one downside to iray is that it will only work with a unique set of materials that are available in 3ds Max. If a default material is used, it is simply rendered as flat white. <br /><br />&nbsp;&nbsp;3ds Max is a product developed by a team at Autodesk, and mental ray and iray is developed independent of Autodesk by NVIDIA. As mentioned, mental images was acquired in 2008 by a much bigger fish named NVIDIA, who has since put their team in contact with the brains behind the Quadro graphics card. The result is that it has been developed to take advantage of the latest features built into the Quadro graphics cards, enabling the software to render scenes with amazing quality faster than ever before and faster than any other graphics card. <br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[2ce37bca8660eafd6911f0ea0682913a]' id='ipb-attach-url-4944-0-24963800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4944" title="Figure 4 - Robot arm.jpg - Size: 178.61K, Downloads: 1282"><img src="http://uploads.gamedev.net/monthly_08_2011/ccs-8549-0-97279200-1313436988_thumb.jpg" id='ipb-attach-img-4944-0-24963800-1369539296' style='width:250;height:141' class='attach' width="250" height="141" alt="Attached Image: Figure 4 - Robot arm.jpg" /></a><br />Figure 4: This robot arm includes over 210,000 polys and was rendered in NVIDIA iray in just over 7 minutes. <br /></p><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>The Magic Behind the Technology </strong></span><br /><br />&nbsp;&nbsp;So how is it done? The Quadro 5000 has a unit called the GigaThread Engine that lets several tasks be worked on at the same time including loading data into and out of the GPU while simultaneously doing some rendering processes. There is also a Scalable Geometry Engine that lets each cluster create triangles. By having each cluster work independently, the total throughput gets multiplied resulting in faster renders. <br /><br />&nbsp;&nbsp;The Quadro series of professional graphic cards also includes a feature called Error Correcting Code (ECC) that detects many multi-bit errors and re-runs the data to eliminate the errors. This is especially important in some critical areas like engineering and medical imaging. This feature can be disabled if the error correction isn't needed to make the process even faster. <br /><br />&nbsp;&nbsp;The Quadro 5000 has 352 Compute Unified Device Architecture (CUDA) parallel processor cores. It is the NVIDIA CUDA architecture that iray takes advantage of to speed its rendering cycles. The Quadro 5000 board also supports OpenGL 4.0, Shader Model 5.0 and DirectX 11. It is capable or rendering 950 million triangles per second.<br /><br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Other Benefits and Features</strong></span><br /><br />&nbsp;&nbsp;The former mental images team, whose members are now are part of NVIDIA, isn't the only software product development team that has been working closely with NVIDIA engineers. The development team behind the Mercury Playback Engine found in the latest release of Adobe's Premiere Pro has also taken advantage of the optimized CUDA features. Quadro graphics boards allows video professionals to create and edit a large number of effects and transitions including color correction, blur, deinterlacing, compositing and blending, and play them back in real-time, which is a huge time-saver for video productions. <br /><br />&nbsp;&nbsp;The NVIDIA GPU found in the Quadro board also can accelerate physics simulations via the NVIDIA PhysX engine. 3ds Max's MassFX system includes an option to enable Hardware Acceleration that offloads some of the physics processing to the GPU. <br /><br />&nbsp;&nbsp;The Quadro 5000 graphics board also includes support for stereo 3D using the OpenGL stereo API. The NVIDIA Control Panel lets you set up and test out several different types of 3D glasses including the standard red/blue (anaglyph) glasses, and top-of-the-line active shutter 3D glasses like NVIDIA 3D Vision and 3D Vision Pro, the latter two of which NVIDIA recommends for viewing 3D games and applications. There is also a list of compatible games that you can play in 3D and a slide show of 3D game images. <br />&nbsp;&nbsp; <br /><br /><span style='font-size: 18px;'>&nbsp;&nbsp;<strong class='bbc'>Summary</strong></span><br /><br />&nbsp;&nbsp;In summary, I'm not at all surprised that the Quadro 5000 graphics board was faster than the default board that came with my system, but I had no idea just how much faster it would be, especially with iray scenes. When mentioning to Sean Kilbride, the technical marketing manager at NVIDIA for workstation product reviews, that this board would probably be great for games, he mentioned that this board is really a professional level board and would be overkill for games. <br /><br />&nbsp;&nbsp;I found that the Quadro 5000 graphics board significantly speeded up my workflow and made me much more productive, especially when I got to the test render phase of a project. I found I could quickly find where I needed to spend more time without having to do a complete render. The end result is that I can get more projects done in a shorter amount of time, which is great news when I got yet another deadline looming. <br /><br />&nbsp;&nbsp;You can find more information on 3ds Max and NVIDIA iray at the www.autodesk.com web site and more information on the Quadro line of professional graphics cards is available at <a href='http://www.nvidia.com/quadro' class='bbc_url' title='External link' rel='nofollow external'>www.nvidia.com/quadro</a>.]]></description>
		<pubDate>Mon, 15 Aug 2011 19:37:57 +0000</pubDate>
		<guid isPermaLink="false">d46e6e59cb2e09d436f89c57efa23735</guid>
	</item>
	<item>
		<title>MLAA: Efficiently Moving Antialiasing from the GPU to the CPU</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/mlaa-efficiently-moving-antialiasing-from-the-gpu-to-the-cpu-r2809</link>
		<description><![CDATA[<strong class='bbc'> Brought to you by the </strong><a href='http://software.intel.com/en-us/visual-computing/?cid=sw:graphics240' class='bbc_url' title='External link' rel='nofollow external'><strong class='bbc'>Intel® Visual Computing Developer Community</strong></a><br /><br /><a href='http://software.intel.com/en-us/articles/mlaa/?cid=sw:graphics236' class='bbc_url' title='External link' rel='nofollow external'>Download the Source Code »</a><br /><a href='http://software.intel.com/en-us/videos/cpu-based-mlaa-implementation/?cid=sw:graphics238' class='bbc_url' title='External link' rel='nofollow external'>Watch the Video »</a><br /><br /><br /><strong class='bbc'><span style='font-size: 18px;'>Introduction</span></strong><br /><br />Efficient antialiasing techniques are an important tool of high-quality, real-time rendering. MSAA (Multisample Antialiasing) is the standard technique in use today, but comes with some serious disadvantages:<br /><ul class='bbc'><li>Incompatibility with deferred lighting, which is used more and more in real-time rendering;</li><li>High memory and processing cost, which makes its use prohibitive on some widely available platforms (such as the Sony Playstation* PS3* [Perthuis 2010]). This cost is also directly linked to the complexity of the scene rendered;</li><li>Inability to smooth non-geometric edges unless used in conjunction with alpha to coverage.</li></ul>A new technique developed by Intel Labs called Morphological Antialiasing (MLAA) [Reshetov 2009] addresses these limitations. MLAA is an image-based, post-process filtering technique which identifies discontinuity patterns and blends colors in the neighborhood of these patterns to perform effective antialiasing. It is the precursor of a new generation of real-time antialiasing techniques that rival MSAA [Jimenez et al., 2011] [SIGGRAPH 2011].<br /><br />This sample is based on the original, CPU-based MLAA implementation provided by Reshetov, with improvements to greatly increase performance. The improvements are:<br /><ul class='bbc'><li>Integration of a new, efficient, easy-to-use tasking system implemented on top of Intel® Threading Building Blocks (Intel® TBB).</li><li>Integration of a new, efficient, easy to use pipelining system for CPU onloading of graphics tasks.</li><li>Improvement of data access patterns through a new transposing pass.</li><li>Increased use of Intel® SSE instructions to optimize discontinuities detection and color blending.</li></ul><strong class='bbc'><span style='font-size: 18px;'>The MLAA algorithm</span></strong><br /><br />In this section we present an overview of how the MLAA algorithm works; cf. [Reshetov 2009] for the fully detailed explanation. Conceptually, MLAA processes the image buffer in three steps:<br /><ul class='bbcol decimal'><li>Find discontinuities between pixels in a given image.</li><li>Identify U-shaped, Z-shaped, L-shaped patterns.</li><li>Blend colors in the neighborhood of these patterns.</li></ul>The first step (find discontinuities) is implemented by comparing each pixel to its neighbors. Horizontal discontinuities are identified by comparing a pixel to its bottom neighbor, and vertical discontinuities by comparing with the neighbor on the right. In our implementation we compare color values, but any other approach that suits the application’s specificities is perfectly valid.<br /><br />At the end of the first step, each pixel is marked with the horizontal discontinuity flag and/or the vertical discontinuity flag, if such discontinuities are detected. In the next step, we “walk” the marked pixels to identify discontinuity lines (sequences of consecutive pixels marked with the same discontinuity flag), and determine how they combine to form L-shaped patterns, as illustrated below:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4471-0-26764800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4471" title="Figure1.jpg - Size: 60.34K, Downloads: 693"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-42684000-1311682860_thumb.jpg" id='ipb-attach-img-4471-0-26764800-1369539296' style='width:250;height:160' class='attach' width="250" height="160" alt="Attached Image: Figure1.jpg" /></a><br /><strong class='bbc'>Figure 1:</strong> <em class='bbc'>MLAA processing of an image, with Z, U, and L-shapes shown on the original image on the left</em><br /><br /><p class='bbc_left'>The third and final step is to perform blending for each of the identified L-shaped patterns.<br /><br />The general idea is to connect the middle point of the primary segment of the L-shape (horizontal green line in the figure below) to the middle point of the secondary segment (vertical green line—the connection line is in red). The connection line splits each pixel into two trapezoids; for each pixel the area of the corresponding trapezoid determines the blending weights. For example, in the figure below, we see that the trapezoid’s area for pixel c5 is 1/3; so the new color of c5 will be computed as 2/3 * (color of c5) + 1/3 * (color of c5’s bottom neighbor).<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4472-0-26782900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4472" title="Figure2.jpg - Size: 29.25K, Downloads: 1876"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-66168900-1311682917_thumb.jpg" id='ipb-attach-img-4472-0-26782900-1369539296' style='width:250;height:94' class='attach' width="250" height="94" alt="Attached Image: Figure2.jpg" /></a><br /><strong class='bbc'>Figure 2: </strong><em class='bbc'>Computing blending weights</em><br /><br /><p class='bbc_left'>In practice, to ensure a smooth silhouette look, we need to minimize the color differences at the stitching positions of consecutive L-shapes. To achieve this, we slightly adjust the connection points on the L-shape segments around the middle position based on the colors of the pixels around the stitching point.<br /><br /> <br /><span style='font-size: 18px;'><strong class='bbc'>Sample usage</strong></span><br /><br />The camera can be moved around the scene using drag-and-click, and the mouse wheel can be used to zoom in or out. In addition, there are three blocks of controls on the right side of the sample’s window:<br /></p><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4473-0-26798700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4473" title="Figure3.jpg - Size: 57.07K, Downloads: 2162"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-18193600-1311682999_thumb.jpg" id='ipb-attach-img-4473-0-26798700-1369539296' style='width:250;height:163' class='attach' width="250" height="163" alt="Attached Image: Figure3.jpg" /></a><br /><strong class='bbc'>Figure 3: </strong><em class='bbc'>A screenshot of the sample in action</em><br /><br /><p class='bbc_left'>The first block controls the rendering of the scene:<br /><ul class='bbc'><li><em class='bbc'>Pause Scene </em>toggles the scene’s animations on/off;</li><li><em class='bbc'>Show zoom box </em>toggles the <em class='bbc'>zoom box </em>feature on/off, which allows the user to take a closer look at a pixel area to better compare the antialiasing techniques. The area of interest can be changed by right-clicking to the new area to examine;</li><li><em class='bbc'>Scene complexity </em>simulates the effect of increasing scene complexity by using overdraw. The value (between 1 and 100, adjusted by the slider) indicates how many times the scene is rendered per frame (with overdraw).</li></ul>The second block selects which antialiasing technique to apply: MLAA, MSAA (4x) or no antialiasing. This is of course to allow comparison of the techniques’ performance and quality (the default choice is “MLAA”);<br /><br />The last block of controls is only available if the antialiasing technique used is MLAA, and controls how the algorithm should be run:<br /><ul class='bbc'><li><em class='bbc'>Copy/Map/Unmap only </em>copies the color buffer from GPU memory to CPU memory and back, but won’t perform any MLAA processing between the two copy operations. This allows measurement of the impact of the copy operations on the overall performance of the entire algorithm;</li><li><em class='bbc'>CPU tasks pipelining </em>turns the pipelining system on/off for CPU onloading of graphics tasks (the default is on) so that it is easy to see the benefit of pipelining;</li><li><em class='bbc'>Show found edges </em>runs the first part of the algorithm (find discontinuities between pixels), but the blending passes are replaced by a debug pass, where a pixel is:</li><li>changed to solid green if a horizontal discontinuity has been found with its neighbor;</li><li>to solid blue if a vertical discontinuity has been found with its neighbor;</li><li>changed to solid red if both horizontal and vertical discontinuities have been found with its neighbors;</li><li>unchanged if no discontinuities have been found.</li></ul><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4474-0-26813800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4474" title="Figure4.jpg - Size: 66.33K, Downloads: 2158"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-58005300-1311683218_thumb.jpg" id='ipb-attach-img-4474-0-26813800-1369539296' style='width:250;height:163' class='attach' width="250" height="163" alt="Attached Image: Figure4.jpg" /></a><br /><strong class='bbc'>Figure 4. </strong><em class='bbc'>Gear tower scene with MLAA and the ”show found edges” option enabled</em><br /><br /><br /><p class='bbc_left'><strong class='bbc'><span style='font-size: 18px;'>Sample Architecture</span></strong><br /><br />Without the pipelining optimizations, the sequence of events for each frame is:<br /><br /><strong class='bbc'>ANIMATE AND RENDER TEST SCENE</strong><br /><br /><strong class='bbc'>MLAA STAGE (if MLAA is enabled) </strong><br /><blockquote>Copy the color buffer where the scene was rendered to a staging buffer<br />Map the staging buffer for CPU-side access.<br />MLAA post-processing (the staging buffer is both input and output)<br />Unmap the staging buffer<br />Copy staging buffer back to the GPU-side color buffer<br />Render the zoombox (if applicable)</blockquote><strong class='bbc'>RENDER THE SAMPLE’s UI, PRESENT FRAME</strong><br /><br />Except for the “Perform the MLAA post-processing work” step (and again not considering the pipeline for now), all of these steps are implemented using standard Microsoft DirectX* methods.<br />	<br /><br /><strong class='bbc'><span style='font-size: 18px;'>The Tasking API</span></strong><br /><br />By nature the MLAA algorithm is easy to parallelize. For both the discontinuities detection and blending passes, we can process the color buffer in independent chunks (blocks of contiguous rows or columns). MLAA can be implemented using a task-based solution that automatically makes full use of all available CPU cores, while keeping the code core count agnostic<br /><br />This sample uses a simple C-based tasking API that is implemented on top of the Intel® Threading Building Blocks (Intel® TBB) scheduler. This wrapper API was created to simplify the integration of the technique into existing codebases which already expose a similar tasking API (e.g. cross-platform game engines). An added benefit is the increased readability of the main source file <em class='bbc'>MLAA.cpp</em>.<br /><br />The two important functions of the wrapper APIs are:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4475-0-26829000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4475" title="TheTaskingAPI_1.jpg - Size: 36.15K, Downloads: 1610"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-76698600-1311683397_thumb.jpg" id='ipb-attach-img-4475-0-26829000-1369539296' style='width:250;height:88' class='attach' width="250" height="88" alt="Attached Image: TheTaskingAPI_1.jpg" /></a><br /><br /></p>And:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4476-0-26844100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4476" title="TheTaskingAPI_2.jpg - Size: 15.98K, Downloads: 1019"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-19045900-1311683476_thumb.jpg" id='ipb-attach-img-4476-0-26844100-1369539296' style='width:250;height:43' class='attach' width="250" height="43" alt="Attached Image: TheTaskingAPI_2.jpg" /></a><br /><br /><p class='bbc_left'>The callback function has the following signature:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4477-0-26859300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4477" title="TheTaskingAPI_3.jpg - Size: 10.39K, Downloads: 793"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-51063700-1311683476_thumb.jpg" id='ipb-attach-img-4477-0-26859300-1369539296' style='width:250;height:24' class='attach' width="250" height="24" alt="Attached Image: TheTaskingAPI_3.jpg" /></a><br /><br /><p class='bbc_left'>.MLAA requires a dependency graph of three consecutive tasksets:<br /><ul class='bbc'><li>The first taskset finds the discontinuities between pixels in the color buffer;</li><li>The second taskset performs the horizontal blending pass, and depends on the completion of the first taskset for the discontinuities information;</li><li>The third taskset performs the vertical blending pass, which depends on the completion of the first taskset for the discontinuities information, but also on the completion of the second taskset because of the transpose optimization (cf. corresponding section for details).</li></ul>This dependency graph is expressed in <em class='bbc'>MLAA.cpp </em>as a sequence of three calls to <span style='font-family: Courier New'>CreateTaskSet;</span> the taskset callback functions (implemented in <em class='bbc'>MLAAPostProcess.cpp</em>) being respectively <span style='font-family: Courier New'>MLAAFindDiscontinuitiesTask</span>, <span style='font-family: Courier New'>MLAABlendHTask</span>, <span style='font-family: Courier New'>MLAABlendVTask</span>.<br /><br />If pipelining is not enabled, we wait for the last taskset to complete by calling <span style='font-family: Courier New'>WaitForSet</span> with the handle of the last taskset. When the call returns, the MLAA work for the frame is complete. Things are just a little bit more complicated when using the pipeline.<br /><br /><br /><strong class='bbc'><span style='font-size: 18px;'>The CPU/GPU pipeline</span></strong><br /><br />To get the maximum performance from our implementation, we have to keep both CPU and GPU sides fully utilized. Due to the data dependencies between the tasksets, full utilization can be achieved by interleaving the processing of multiple frames, as shown in the diagram below:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4478-0-26874500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4478" title="Figure5.jpg - Size: 26.1K, Downloads: 759"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-92483800-1311683646_thumb.jpg" id='ipb-attach-img-4478-0-26874500-1369539296' style='width:250;height:91' class='attach' width="250" height="91" alt="Attached Image: Figure5.jpg" /></a><br /><strong class='bbc'>Figure 5: </strong><em class='bbc'>Frames moving through the pipeline. The red blocks on the CPU main thread timeline illustrate that the main thread <br />will take MLAA work if the application gets CPU-bound.</em><br /><br /><p class='bbc_left'>In other words, we have to build a pipelining system, where a pipeline is a sequence of CPU stages (workloads) and GPU stages, and able to run multiple instances of said pipeline at the same time.<br /><br />In our case, each instance of the pipeline corresponds to the processing of one frame. We have three steps:<br /><br />Step 1 (GPU stage):<br /><ul class='bbc'><li>Animate and render test scene;</li><li>Copy color buffer to staging buffer (using asynchronous GPU-side CopyResource).</li></ul>Step 2 (CPU stage):<br /><ul class='bbc'><li>Map the staging buffer;</li><li>Perform the MLAA post-processing work (staging buffer is both input and output).</li></ul>Step 3 (GPU stage):<br /><ul class='bbc'><li>Unmap staging buffer and copy it back to the GPU-side color buffer;</li><li>Finish frame rendering (zoombox, UI) and present.</li></ul>To implement this concept, we designed a simple Pipeline class. Each stage is represented by a <span style='font-family: Courier New'>PipelineFunction</span> structure that specifies the function to be called, and the stage type. The callback function must have the following signature:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4479-0-26892200-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4479" title="CPU-GPU-Pipeline_1.jpg - Size: 8.76K, Downloads: 709"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-54937500-1311683771_thumb.jpg" id='ipb-attach-img-4479-0-26892200-1369539296' style='width:250;height:24' class='attach' width="250" height="24" alt="Attached Image: CPU-GPU-Pipeline_1.jpg" /></a><br /><br /><p class='bbc_left'>Where <span style='font-family: Courier New'>uInstance</span> indicates which instance of the pipeline the function is being called from. A GPU stage uses a DirectX* query (of type <span style='font-family: Courier New'>D3D11_QUERY_EVENT</span>) to signal its completion, but a CPU stage has to explicitly call the Pipeline’s class <span style='font-family: Courier New'>CompleteCPUWait</span> method to signal completion:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4480-0-26907700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4480" title="CPU-GPU-Pipeline_2.jpg - Size: 18.7K, Downloads: 482"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-33366800-1311683820_thumb.jpg" id='ipb-attach-img-4480-0-26907700-1369539296' style='width:250;height:48' class='attach' width="250" height="48" alt="Attached Image: CPU-GPU-Pipeline_2.jpg" /></a><br /><br /><p class='bbc_left'>Creating the pipeline instances requires a single call to the Init method. In our case, the code is:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4481-0-26923100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4481" title="CPU-GPU-Pipeline_3.jpg - Size: 20.41K, Downloads: 489"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-56617800-1311683820_thumb.jpg" id='ipb-attach-img-4481-0-26923100-1369539296' style='width:250;height:52' class='attach' width="250" height="52" alt="Attached Image: CPU-GPU-Pipeline_3.jpg" /></a><br /><br /><p class='bbc_left'>Where g_NumPipelineInstances is the number of pipeline instances to create (3 in this case).<br /><br />To run the pipelines, we call the <span style='font-family: Courier New'>Start</span> method each frame:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4482-0-26938500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4482" title="CPU-GPU-Pipeline_4.jpg - Size: 31.48K, Downloads: 613"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-83305800-1311683820_thumb.jpg" id='ipb-attach-img-4482-0-26938500-1369539296' style='width:250;height:80' class='attach' width="250" height="80" alt="Attached Image: CPU-GPU-Pipeline_4.jpg" /></a><br /></p></p><p class='bbc_left'>Because the <span style='font-family: Courier New'>Start</span> method returns the index of a completed instance of the pipeline, the third and last step doesn’t have to be called through the <span style='font-family: Courier New'>Pipeline</span> class. The code of the last step executes right after the call to the <span style='font-family: Courier New'>Start</span> method, indexing the data structures with the index returned by <span style='font-family: Courier New'>Start</span>.<br /><br />Pipelining does not rely on the tasking API <span style='font-family: Courier New'>WaitForSet</span> call, since it is blocking and so does not allow pipelining to occur. The solution is to use a <em class='bbc'>completion taskset</em>—that is, a task that depends on the completion of all the MLAA tasks, and whose only work will be to call the <span style='font-family: Courier New'>CompleteCPUWait</span> method.<br /><br /><br /><span style='font-size: 18px;'><strong class='bbc'>Intel® SSE optimizations</strong></span><br /><br />The first pass of the MLAA algorithm identifies discontinuities between pixels. Conceptually, each pixel checks its color and compares it with its bottom neighbor (when looking for horizontal discontinuities) or right neighbor (when looking for vertical discontinuities) [Reshetov09, section 2.2.1].<br /><br />In this sample, we have kept the simple discontinuity detection kernel of the reference implementation. A discontinuity exists between two pixels if at least one of their RGB color components differs by at least 16 (on the 0-255 scale of the RGBA8 format).<br /><br />This definition works well in the sample and allows a very compact and efficient SIMD implementation. However, more complex approaches are possible, and sometimes necessary, to get the best possible results. For example, a pixel’s luminance could be used instead of directly comparing color values, and a variable threshold recomputed each frame to take into account the scene’s overall luminance and contrast [Luminance]. Depth values could be used to assist with edge detection or any kind of custom data to exclude specific zones of the color buffer from being processed.<br /><br />Because this step is independent from the rest of the algorithm, and we are running on the CPU, any data from the program can be used directly as input data. The only limits to the detection algorithm are:<br /><ul class='bbc'><li>The fact that the algorithm must output a bit for each pixel indicating whether a discontinuity is detected;</li><li>The tradeoff between performance and quality/complexity;</li><li>The programmer’s imagination.</li></ul>As with the original reference implementation [Reshetov09, section 2.4], we work with a RGBA8 render target format and the “vertical discontinuity” and “horizontal discontinuity” bit flags computed by MLAA are stored in-place in the two high bits of the pixel data (i.e., the two high bits of the alpha component which is unaffected by the MLAA blending operations). This keeps the implementation simple while helping with memory footprint, and allowing optimizations in the next steps of the algorithms (cf. “the transposing optimization” section below).<br /> <br />Because each pixel is 32 bits of data, and each pixel can be processed independently of the others in this step, we can process 4 pixels at a time using Intel® SSE intrinsics. The detection code is very short.<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4483-0-26953900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4483" title="IntelSSE-Optimizations_1.jpg - Size: 55.39K, Downloads: 574"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-44440100-1311684055_thumb.jpg" id='ipb-attach-img-4483-0-26953900-1369539296' style='width:250;height:106' class='attach' width="250" height="106" alt="Attached Image: IntelSSE-Optimizations_1.jpg" /></a><br /><br /><p class='bbc_left'>The reference implementation then proceeds to update the alpha component of each pixel, but used inefficient serial code to do so. We can optimize this sequence by using the following Intel® SSE code:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4484-0-26969400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4484" title="IntelSSE-Optimizations_2.jpg - Size: 21.2K, Downloads: 530"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-73817200-1311684055_thumb.jpg" id='ipb-attach-img-4484-0-26969400-1369539296' style='width:250;height:47' class='attach' width="250" height="47" alt="Attached Image: IntelSSE-Optimizations_2.jpg" /></a><br /><br /><p class='bbc_left'>We also replaced the <span style='font-family: Courier New'>MixColors</span> function (which computes a linear interpolation of two colors) to use a full Intel® SSE implementation.<br /><br /><br /><strong class='bbc'><span style='font-size: 18px;'>The Transposing Optimization</span></strong><br /><br />The next task of the algorithm is to find “discontinuity lines”, i.e., sequences of consecutive pixels which are marked with the same discontinuity flag (horizontal flag for horizontal blending pass, vertical flag for vertical blending pass) while walking rows and columns of our color buffer [Reshetov09, section 2.1].<br /><br />Because discontinuity lines tend to be short, intuition suggests (and profiling data confirms) that the most expensive part of this operation is scanning <em class='bbc'>between </em>discontinuity lines, i.e., scanning the large areas of consecutive pixels where no discontinuity flag is set.<br /><br />The good news is that we can check 4 pixels at a time using the <span style='font-family: Courier New'>_mm_movemask_ps</span> Intel® SSE intrinsic when the following conditions are met:<br /><ul class='bbcol decimal'><li>We are scanning 4 pixels stored at consecutive addresses;</li><li>The address of the first pixel is “Intel® SSE-aligned” (16 bytes alignment);</li><li>The discontinuity flag is stored as the high bit of the 32 bits of pixel data.</li></ul>During the horizontal blending pass, (1) is true (we scan horizontal rows of pixels in the color buffer represented as a 2D linear array of pixel data); (2) is true almost all the time (remember that 16 bytes alignment is equivalent to “the index of the starting pixel in the buffer is a multiple of 4” as the buffer is properly aligned); and (3) is true as we chose bit 31 to represent the horizontal discontinuity flag.<br /><br />If all conditions are true, we compute the flags:<br /></p><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4485-0-26986900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4485" title="Transposing-Optimization_1.jpg - Size: 15K, Downloads: 425"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-22712600-1311684214_thumb.jpg" id='ipb-attach-img-4485-0-26986900-1369539296' style='width:250;height:32' class='attach' width="250" height="32" alt="Attached Image: Transposing-Optimization_1.jpg" /></a><br /><br /><p class='bbc_left'>And five outcomes are possible depending on the value of HFlags:<br /><ul class='bbc'><li>0 (the most common case by far): no discontinuity flag set in this group of 4 pixels, move to the next group of 4 pixels.</li><li>Bit 0 is set: discontinuity line starts at first pixel of the group.</li><li>Bit 1 is set: discontinuity line starts at second pixel of the group.</li><li>Bit 2 is set: discontinuity line starts at third pixel of the group.</li><li>Bit 3 is set: discontinuity line starts at fourth pixel of the group.</li></ul>This optimization is part of the reference implementation and works well for the horizontal blending pass, but was impossible to apply for the vertical blending pass. As the code scans the buffer vertically, (1) is false (adjacent pixels in a column are not stored at consecutive addresses), and (3) is false as well (the vertical discontinuity flag is stored at bit 30 of the pixel data).<br /><br />The problem with (3) is easy to work around by introducing a simple “shift left by one bit” operation if we are processing the vertical flags, transforming the code above to:<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4486-0-27003100-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4486" title="Transposing-Optimization_2.jpg - Size: 19.08K, Downloads: 547"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-50114500-1311684214_thumb.jpg" id='ipb-attach-img-4486-0-27003100-1369539296' style='width:250;height:39' class='attach' width="250" height="39" alt="Attached Image: Transposing-Optimization_2.jpg" /></a><br /><br /><p class='bbc_left'>But (1) is still a problem. In addition, vertical scanning is extremely cache-unfriendly. Because of both of these issues, the vertical blending pass is about 3 times (300%!) as expensive as the horizontal blending pass in the reference implementation (as shown by profiling data).<br /><br />Our solution to this issue is to make the vertical blending pass use the cache and Intel® SSE-friendly data access patterns of the horizontal blending pass by considering the color buffer as a matrix of pixels and transposing it between passes:<br /><ul class='bbc'><li>Perform horizontal blending pass</li><li>Transpose the (horizontally blended) color buffer</li><li>Perform vertical blending pass</li><li>Transpose back color buffer</li></ul>This way the code for both blending passes is exactly the same (which adds the advantage of simplicity/readability), the only difference being which flag to scan for, and we benefit from all the optimizations and cache-friendliness of the horizontal pass.<br /><br />In practice, the transpose operations are not implemented as separate passes/tasksets, but as the last part of their respective blending pass. This allows us to benefit from the cache “warmness”.<br /><br />The two transpose operations are not free. The cost is the extra code execution time, one extra work buffer and a synchronization point between the horizontal and vertical passes (we must wait for <em class='bbc'>all </em>horizontal tasks to be done before we can start <em class='bbc'>any </em>of the vertical tasks, because we must wait for the color buffer to be fully transposed before starting the vertical pass work).<br /><br />Even with these extra costs, the overall performance is significantly better than the previous approach. As expected, profiling data shows that both blending passes have equivalent performance.<br /><br /><br /><strong class='bbc'><span style='font-size: 18px;'>Performance Results</span></strong><br /><br />MLAA performance was measured on the following two configurations:<br /><ul class='bbc'><li>Code name “Huron River” : Intel® Core™ i7-2820QM Processor (Intel® microarchitecture code name Sandy Bridge, 4 cores 8 threads @2.30 GHz) with GT2 processor graphics, 4 GB of RAM, Windows 7 Ultimate 64-bit Service Pack 1</li><li>Code name “Kings Creek”: Intel® Core™ i5-660 Processor (codename “Clarkdale”, 2 cores 4 threads @ 3.33 Ghz), with GMA HD Graphics (codename “Ironlake”), 2 GB of RAM, Windows 7 Ultimate 64-bit Service Pack 1</li></ul>We measured the average frame rendering time of our sample as a function of scene complexity for the different antialiasing settings. We also measured the rendering time for the <em class='bbc'>Copy/Map/Umap only </em>mode to highlight the impact of the color buffer copy operations on the overall performance of the algorithm.<br /><br /><br /><strong class='bbc'><span style='font-size: 18px;'>Results</span></strong><br /><br />The data shows that for the Huron River machine, the extra cost of MSAA 4x goes up linearly with the scene complexity (in fact, for all resolutions, the frame time when using MSAA 4x to render our test scene appears to be approximately the double of the frame time when no antialiasing is used). In contrast, the cost of MLAA appears more or less constant (around 4 ms/frame at 1280x800). This is consistent with our expectations, as unlike MSAA 4x, MLAA is executed only once per frame, regardless of scene complexity / number of draw calls.<br /><br />We also observe that except for very low complexity values (i.e. less than ~5) MLAA always outperforms MSAA 4x, regardless of resolution, and that the difference in performance grows with complexity (because as noted above, the cost of MSAA 4x grows linearly with complexity when the cost of MLAA does not).<br /><br />In the case of the Kings Creek machine, we can’t compare the cost of MLAA vs. MSAA 4x as the latter is not provided by the Ironlake hardware. The goal is then to determine if MLAA allows us to provide software antialiasing as an alternative with acceptable performance. Our measurements at the 1280x800 resolution show that again the cost of MLAA is largely independent of scene complexity, and the average value is ~7.5 ms (discarding the outlier data point at complexity = 100).<br /><br />Interestingly, if we compare this result to a hypothetical MSAA 4x implementation with approximately the same performance profile than the Huron River one (frame time with MSAA 4x ~ 2x frame time with no antialiasing), we notice that again MLAA would outperform MSAA 4x for almost all complexity values (=4 in this case).<br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4487-0-27019000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4487" title="Table1.jpg - Size: 43.21K, Downloads: 639"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-00938200-1311684504_thumb.jpg" id='ipb-attach-img-4487-0-27019000-1369539296' style='width:250;height:50' class='attach' width="250" height="50" alt="Attached Image: Table1.jpg" /></a><br /><strong class='bbc'>Table 1. </strong><em class='bbc'>Rendering times of our test scene on the King’s Creek machine.</em><br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4488-0-27034900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4488" title="Table2.jpg - Size: 199.85K, Downloads: 813"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-66606500-1311684504_thumb.jpg" id='ipb-attach-img-4488-0-27034900-1369539296' style='width:224;height:200' class='attach' width="224" height="200" alt="Attached Image: Table2.jpg" /></a><br /><strong class='bbc'>Table 2. </strong><em class='bbc'>Rendering times of our test scene on the Huron River machine.</em><br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4489-0-27050800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4489" title="Figure6.jpg - Size: 46.49K, Downloads: 796"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-73341600-1311684572_thumb.jpg" id='ipb-attach-img-4489-0-27050800-1369539296' style='width:232;height:200' class='attach' width="232" height="200" alt="Attached Image: Figure6.jpg" /></a><br /><strong class='bbc'>Figure 6: </strong><em class='bbc'>Rendering times of our test scene for each of the antialiasing techniques, as a function of scene complexity. The <br />bottom, flat curve is the difference between the “MLAA with pipeline on” curve, and the “No antialiasing” curve, measuring the<br />cost of using MLAA [Huron River, res. 1280x800]</em><br /><br /><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4490-0-27066700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4490" title="Figure7.jpg - Size: 43.51K, Downloads: 743"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-06447700-1311684573_thumb.jpg" id='ipb-attach-img-4490-0-27066700-1369539296' style='width:231;height:200' class='attach' width="231" height="200" alt="Attached Image: Figure7.jpg" /></a><br /><strong class='bbc'>Figure 7: </strong><em class='bbc'>Rendering times of our test scene for each of the antialiasing techniques, as a function of scene complexity. The <br />bottom, flat curve is the difference between the “MLAA with pipeline on” curve, and the “No anti aliasing” curve, measuring the <br />cost of using MLAA [Huron River, res. 1600x1200]</em><br /><br /><a class='resized_img' rel='lightbox[22897ba916c8a82ffd8b2eb2fe1e57bb]' id='ipb-attach-url-4491-0-27084500-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=4491" title="Figure8.jpg - Size: 40.22K, Downloads: 557"><img src="http://uploads.gamedev.net/monthly_07_2011/ccs-8549-0-36954500-1311684573_thumb.jpg" id='ipb-attach-img-4491-0-27084500-1369539296' style='width:230;height:200' class='attach' width="230" height="200" alt="Attached Image: Figure8.jpg" /></a><br /><strong class='bbc'>Figure 8: </strong><em class='bbc'>Rendering times of our test scene with MLAA on and off, as a function of scene complexity. The bottom, flat curve is <br />the difference between the “MLAA with pipeline on” curve, and the “No antialiasing” curve, measuring the cost of using MLAA <br />[Kings Creek, res. 1280x800]</em><br /><br /><br /><p class='bbc_left'><strong class='bbc'><span style='font-size: 18px;'>References</span></strong><br /><br />[Reshetov 2009] RESHETOV, A. 2009. “<a href='http://visual-computing.intel-research.net/publications/papers/2009/mlaa/mlaa.pdf' class='bbc_url' title='External link' rel='nofollow external'>Morphological Antialiasing</a>”<br /><br />[Perthuis 2010] PERTHUIS, C. 2010. MLAA in God of War 3. Sony Computer Entertainment America, PS3 Devcon, Santa Clara, July 2010.<br /><br />[Jimenez et al., 2011] JIMENEZ, J., MASIA, B., ECHEVARRIA, J., NAVARRO, F. and GUTIERREZ, D. 2011. Practical Morphological Anti-Aliasing. In Wolfgang Engel, ed., GPU Pro 2. AK Peters Ltd.<br /><br />[SIGGRAPH 2011] JIMENEZ, J., GUTIERREZ D., YANG, J., RESHETOV, A., DEMOREUILLE, P., BERGHOFF, T., PERTHUIS, C., YU, H., MCGUIRE, M., LOTTES, T., MALAN, H., PERSSON, E., ANDREEV, D. and SOUSA T. 2011. Filtering Approaches for Real-Time Anti-Aliasing. In <em class='bbc'>ACM SIGGRAPH 2011 Courses</em>.<br /><br />[Luminance] Definition of luminance for CRT-like devices:<br /><br />INTERNATIONAL COMMISSION ON ILLUMINATION. 1971. Recommendations on Uniform Color Spaces, Color Difference Equations, Psychometric Color Terms. Supplement No.2 to CIE publication No. 15 (E.-1.3.1), TC-1.3, 1971.<br /><br />And for LCDs:	<br /><br />ITU-R Rec. BT.709-5. 2008. Page 18, items 1.3 and 1.4<br /></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p>]]></description>
		<pubDate>Tue, 26 Jul 2011 12:54:47 +0000</pubDate>
		<guid isPermaLink="false">65ddf8fc40f919eba2239c13c19bead4</guid>
	</item>
	<item>
		<title>Comparing Shadow Mapping Techniques with Shadow Explorer</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/comparing-shadow-mapping-techniques-with-shadow-r2793</link>
		<description><![CDATA[<em class='bbc'>This article brought to you by &lt;a href="<a href='http://software.intel.com/en-us/visual-computing/?cid=sw:graphics192%22&gt;Intel&reg;' class='bbc_url' title='External link' rel='nofollow external'>http://software.intel.com/en-us/visual-computing/?cid=sw:graphics192"&gt;Intel&reg;</a> Visual Computing Developer Community&lt;/a&gt;</em><br /> <br /><p class='bbc_center'><a class='resized_img' rel='lightbox[e05fb95d21395238b037e387d9afb8a6]' id='ipb-attach-url-6755-0-28207700-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=6755" title="34427.jpg - Size: 83.22K, Downloads: 340"><img src="http://uploads.gamedev.net/monthly_01_2012/ccs-8549-0-37236100-1326744283_thumb.jpg" id='ipb-attach-img-6755-0-28207700-1369539296' style='width:480;height:382' class='attach' width="480" height="382" alt="Attached Image: 34427.jpg" /></a></p><span style='font-size: 18px;'><strong class='bbc'>Code Sample Download Page</strong></span><br /><br />&lt;a href="<a href='http://software.intel.com/en-us/articles/shadowexplorer?cid=sw:graphics189%22&gt;http://software.intel.com/en-us/articles/shadowexplorer/&lt;/a&gt;' class='bbc_url' title='External link' rel='nofollow external'>http://software.intel.com/en-us/articles/shadowexplorer?cid=sw:graphics189"&gt;http://software.intel.com/en-us/articles/shadowexplorer/&lt;/a&gt;</a><br /><br /><strong class='bbc'><span style='font-size: 18px;'>Download Article</span></strong><br /><br />Download &lt;a href="<a href='http://software.intel.com/file/34452?cid=sw:graphics189%22&gt;Comparing' class='bbc_url' title='External link' rel='nofollow external'>http://software.intel.com/file/34452?cid=sw:graphics189"&gt;Comparing</a> Shadow Mapping Techniques with Shadow Explorer&lt;/a&gt; [PDF 990KB]<br /><br /><strong class='bbc'><span style='font-size: 18px;'>Introduction</span></strong><br /><br />Most modern games use shadows to some degree. The challenge is to know which algorithm to use and what the tradeoffs are of different techniques-what combination of quality and performance is best suited for the game. This sample, Shadow Explorer, lets the user compare and contrast four different algorithms, adjust various parameters for each one, and observe the effects in real time. The shadow mapping algorithms presented are: simple, percentage closer filtered (PCF), variance (VSM), and exponential variance (EVSM).<br /><br /><strong class='bbc'><span style='font-size: 18px;'>Sample Usage</span></strong><br /><br />At the highest level, this sample allows the user to compare and contrast the quality and performance characteristics of the different shadowing algorithms. There are two scenes that each algorithm can be applied to: a city and a teapot. The city scene represents a "typical" setting that might be encountered in a game, consisting of a variety of object sizes, dimensions, and other characteristics. The teapot is a "worst case" scenario consisting primarily of thin primitives, casting shadows not only on themselves but also on curved surfaces and a flat plane.<br /><br /><strong class='bbc'><span style='font-size: 18px;'>Sample Architecture</span></strong><br /><br />Shadow Explorer is implemented as a Microsoft DirectX* application based on the DXUT framework. All four shadow mapping algorithms basically follow the same code path, except that different shaders are used and that VSM and EVSM run a filter over the shadow map before the scene is rendered. Listing 1 shows the basic flow of the source code for creating and using the shadow map. Every frame, the shadow map is created in &lt;span style="font-family: courier;"&gt;RenderShadowMap()&lt;/span&gt; which builds a standard cascaded shadow map by running the &lt;i&gt;SceneZ&lt;/i&gt; shader. As mentioned, for VSM and EVSM the shadow map is then filtered using the &lt;i&gt;FilterV&lt;/i&gt; and &lt;i&gt;FilterH&lt;/i&gt; shaders. After the shadow map is created, the scene is rendered with the shader &lt;i&gt;SceneMain&lt;/i&gt; which, depending on the algorithm selected, will call a different version of &lt;span style="font-family: courier;"&gt;IsNotInShadow()&lt;/span&gt;. This is done by simply including different .fxh files and recompiling the shaders when a new algorithm is selected. For example, if PCF is selected then &lt;span style="font-family: courier;"&gt;IsNotInShadow()&lt;/span&gt; will call the version in filter_PCF.fxh.<br /><br /><pre class='prettyprint lang-auto linenums:0'>static void RenderShadowMap(...)
{
&nbsp;&nbsp; // For each cascade
&nbsp;&nbsp; for( int i = 0; i &lt; iLayers; ++i )
&nbsp;&nbsp; {
	&nbsp;&nbsp;// Set the Z only pass
	&nbsp;&nbsp;SetRenderTargets( 0, NULL, g_d3d.pShadowMapDSV[i] );
	&nbsp;&nbsp;g_d3d.pTechSceneZ-&gt;GetPassByIndex( 0 )-&gt;Apply( 0,
		 pd3dImmediateContext );
	&nbsp;&nbsp;// Render the appropriate scene
	&nbsp;&nbsp;g_pSelectedMesh-&gt;Render( pd3dImmediateContext );
&nbsp;&nbsp; }
	
&nbsp;&nbsp; // VSM and EVSM techniques require additional filters
&nbsp;&nbsp; if(Filter_Type_VSM || Filter_Type_EVSM)
&nbsp;&nbsp; {
	&nbsp;&nbsp;// Use the shadow map just generated in the filters
	&nbsp;&nbsp;g_d3d.pVarShadowTex-&gt;SetResource( g_d3d.pShadowMapSRV );

	&nbsp;&nbsp;// Run the vertical filter
	&nbsp;&nbsp;SetRenderTargets( 1, &amp;pShadowMapRTV_vsm[0].p, NULL );
	&nbsp;&nbsp;g_d3d.pTechFilter_V-&gt;GetPassByIndex( 0 )-&gt;Apply(...);
	&nbsp;&nbsp;pd3dImmediateContext-&gt;Draw(...);

	&nbsp;&nbsp;// Use the result of the vertical filter
	&nbsp;&nbsp;g_d3d.pVarShadowTex-&gt;SetResource( g_d3d.pShadowMapSRV_vsm[0] );

	&nbsp;&nbsp;// Run the horizontal filter
	&nbsp;&nbsp;SetRenderTargets( 1, &amp;pShadowMapRTV_vsm[1].p, NULL );
	&nbsp;&nbsp;g_d3d.pTechFilter_H-&gt;GetPassByIndex( 0 )-&gt;Apply(...);
	&nbsp;&nbsp;pd3dImmediateContext-&gt;Draw(...);
&nbsp;&nbsp; }
}

static void RenderScene(...)
{
&nbsp;&nbsp; RenderShadowMap(...)

&nbsp;&nbsp; if(Filter_Type_VSM || Filter_Type_EVSM)
	&nbsp;&nbsp;// Use the filtered shadow map
	&nbsp;&nbsp;g_d3d.pVarShadowTex-&gt;SetResource( g_d3d.pShadowMapSRV_vsm[1] );
&nbsp;&nbsp; else
	&nbsp;&nbsp;g_d3d.pVarShadowTex-&gt;SetResource( g_d3d.pShadowMapSRV );
&nbsp;&nbsp; g_pSelectedMesh-&gt;Render(...);
}</pre>&lt;b&gt;Listing 1 - &lt;/b&gt;&lt;i&gt;Basic flow of application code in Shadow Explorer&lt;/i&gt;<br /><br />The simplest technique presented is that of a basic, cascaded shadow map. This is obviously going to be the fastest method as it does the least amount of work, but leaves a lot to be desired in terms of quality. The next technique is PCF, which works off the same shadow map but samples the texture multiple times to alleviate aliasing problems and soften the shadow edges. Better results can be achieved by taking more samples, but with the expected tradeoff in performance. By using a non-uniformly distributed sample distribution, fewer samples need to be taken to achieve good results. Shadow Explorer uses a pre-calculated Halton sequence to calculate "random" offsets instead of a grid based offset. Even using the non-uniform method, PCF still requires a lot of sample look ups, making the algorithm fairly time consuming on the GPU.<br /><br />The next technique, VSM, takes a different approach by storing both the depth and depth squared values into the shadow map. In Shadow Explorer, the shadow map is rendered normally and the depth squared value is added during a separate pass immediately after the z-pass is done. Additionally, this second pass runs a box filter over the data to soften the edges of the shadows. The main drawback to the VSM algorithm is the light bleeding effect when multiple occluders overlap each other and the ratio of their distances from the shadow receiver is high. This can be seen in &lt;b&gt;Figure 1&lt;/b&gt;, where the shadow of the tall building in the background is outlined in light where it overlaps the shadows of the two smaller buildings in the foreground.<br /><br /><br /><br /><br /><p class='bbc_center'><a class='resized_img' rel='lightbox[e05fb95d21395238b037e387d9afb8a6]' id='ipb-attach-url-6756-0-28220900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=6756" title="34428.jpg - Size: 68.17K, Downloads: 347"><img src="http://uploads.gamedev.net/monthly_01_2012/ccs-8549-0-21249600-1326744401_thumb.jpg" id='ipb-attach-img-6756-0-28220900-1369539296' style='width:480;height:382' class='attach' width="480" height="382" alt="Attached Image: 34428.jpg" /></a></p><p class='bbc_center'><strong class='bbc'>Figure 1 - </strong><em class='bbc'>VSM results in light bleeding artifacts</em></p>Depending on the scene, light bleeding may not be noticeable enough to be a problem. If it is, EVSM can be used to eliminate the problem, as can be seen in Figure 2. The main difference with VSM is that EVSM "warps" the depth value when storing and reading from the shadow map. This has the effect of bringing the relative distances of the occluders to the receiver closer together in order to minimize or eliminate light bleeding. Shadow Explorer does this in the shader function &lt;span style="font-family: courier;"&gt;WarpDepth()&lt;/span&gt;.<br /> <br /><p class='bbc_center'><a class='resized_img' rel='lightbox[e05fb95d21395238b037e387d9afb8a6]' id='ipb-attach-url-6757-0-28233000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=6757" title="34429.jpg - Size: 68.29K, Downloads: 306"><img src="http://uploads.gamedev.net/monthly_01_2012/ccs-8549-0-86578900-1326744433_thumb.jpg" id='ipb-attach-img-6757-0-28233000-1369539296' style='width:480;height:382' class='attach' width="480" height="382" alt="Attached Image: 34429.jpg" /></a><br /><strong class='bbc'>Figure 2 - </strong><em class='bbc'>EVSM fixes light bleeding</em></p><span style='font-size: 18px;'><strong class='bbc'>Conclusion</strong></span><br /><br />Shadow Explorer implements four shadow mapping techniques with runtime modification of various parameters so the user can compare both the performance and the quality of the methods. For each technique, care was taken to ensure optimal performance of the algorithms. Some of the optimizations done include using half floats instead of floats, unrolling loops in PCF, running the vertical filter before the horizontal filter, selecting the cascade Z interval instead of fitting to best cascade, and calculating shadows only for triangles oriented towards the light. Additional resources, including both binary and source code versions, are available for download at &lt;a href="<a href='http://software.intel.com/en-us/articles/shadowexplorer?cid=sw:graphics189%22&gt;http://software.intel.com/en-us/articles/shadowexplorer&lt;/a&gt;.' class='bbc_url' title='External link' rel='nofollow external'>http://software.intel.com/en-us/articles/shadowexplorer?cid=sw:graphics189"&gt;http://software.intel.com/en-us/articles/shadowexplorer&lt;/a&gt;.</a><br /><br />Some areas for further development would be the addition of spot and point lights to demonstrate the flexibility of the shadow techniques. Animated objects could also be added to create an environment more representative of a game. Also, different algorithms could be used to more accurately place the split planes.<br /><br /><strong class='bbc'><span style='font-size: 18px;'>Controls</span></strong><br /><br />Shadow Explorer allows the user to change a variety of parameters to modify the shadow algorithms and change the shadow technique being used. Here is a list of the controls along with a brief description:<ul class='bbcol decimal'><li><strong class='bbc'>Toggle full screen</strong> - Turns full screen on/off</li><li><strong class='bbc'>Change device</strong> - Change the d3d device</li><li><strong class='bbc'>Scene drop down list</strong> - Change the scene being viewed</li><li><strong class='bbc'>Align light to camera</strong> - Rotates the light along the light direction for optimal usage of shadow map space.</li><li><strong class='bbc'>Algorithm drop down list</strong> - Change the filtering method being used</li><li><strong class='bbc'>Shadow Map BPP</strong> - How many bits are used for each pixel in the shadow map</li><li><strong class='bbc'>SM resolution</strong> - resolution of the shadow map</li><li><strong class='bbc'>Filter size</strong> - The size of the filter used in PCF, VSM, and EVSM</li><li><strong class='bbc'>Cascade layers</strong> - How many cascades are used</li><li><strong class='bbc'>Cascade Factor</strong> - Adjusts how the split space is partitioned by split planes</li><li><strong class='bbc'>Aperture</strong> - Spatial screen space filter aperture for PCF</li><li><strong class='bbc'>Visualize Cascades</strong> - Visualize the different cascade levels</li><li><strong class='bbc'>Visualize Light Space</strong> - Displays the scene from the light source's point of view.</li><li><strong class='bbc'>Use Texture Array</strong> - Use texture array for cascades. Each cascade layer receives one texture, otherwise a texture atlas is used.</li><li><strong class='bbc'>Z interval selection</strong> - How to determine which cascade a particular pixel is in. If enabled, then only the view space z distance is used. Otherwise, the world position of the pixel is tested against each cascade.</li><li><strong class='bbc'>Deduce Z range</strong> - Calculates z-range from the current view. Otherwise uses the scene bounding box as the upper bound.</li><li><strong class='bbc'>Deduce Res</strong> - Resolution of the render target used to calculate z range.</li><li><strong class='bbc'>Downscale Factor</strong> - Every subsequent GPU deduce pass uses a render target texture of smaller size. This option determines the relation between texture sizes of subsequent passes.</li><li><strong class='bbc'>Downscale Limit</strong> - Maximum texture size when Z-range deduction is performed on GPU. At some point it should be more efficient to pull the whole resource to the CPU and finish deduction there, saving several draw calls.</li></ul><span style='font-size: 18px;'><strong class='bbc'>References</strong></span><ul class='bbcol decimal'><li>Williams, L. 1978. Casting curved shadows on curved surfaces. In Proc. SIGGRAPH, vol. 12, 270-274. <a href='http://portal.acm.org/citation.cfm?id=807402' class='bbc_url' title='External link' rel='nofollow external'>http://portal.acm.or...n.cfm?id=807402</a><br />
			&nbsp;</li><li>Donnelly, W. and Lauritzen, A. Variance shadow maps. In SI3D '06: Proceedings of the 2006 symposium on Interactive 3D graphics and games. 2006. pp. 161-165. New York, NY, USA: ACM Press. <a href='http://www.punkuser.net/vsm/vsm_paper.pdf' class='bbc_url' title='External link' rel='nofollow external'>http://www.punkuser....m/vsm_paper.pdf</a><br />
			&nbsp;</li><li>Lauritzen, Andrew and McCool, Michael. Layered variance shadow maps. Proceedings of graphics interface 2008, May 28-30, 2008, Windsor, Ontario, Canada. <a href='http://portal.acm.org/citation.cfm?id=1375714.1375739&coll=GUIDE&dl=GUIDE' class='bbc_url' title='External link' rel='nofollow external'>http://portal.acm.or...=GUIDE&dl=GUIDE</a><br />
			&nbsp;</li><li>Isidoro, J. R. Shadow Mapping: GPU-based Tips and Techniques. Conference Session. GDC 2006. March 2006, San Jose, CA. <a href='http://developer.amd.com/media/gpu_assets/Isidoro-ShadowMapping.pdf' class='bbc_url' title='External link' rel='nofollow external'>http://developer.amd...adowMapping.pdf</a><br />
			&nbsp;</li><li>The Halton Sequence. <a href='http://orion.math.iastate.edu/reu/2001/voronoi/halton_sequence.html' class='bbc_url' title='External link' rel='nofollow external'>http://orion.math.ia...n_sequence.html</a><br />
			&nbsp;</li><li>Engel, Wolfgang F. Section 4. Cascaded Shadow Maps. ShaderX 5, Advanced Rendering Techniques, Wolfgang F. Engel, Ed. Charles River Media, Boston, Massachusetts. 2006. pp. 197-206.</li></ul><strong class='bbc'><span style='font-size: 18px;'>Read the Blog!</span></strong><br /><br />&lt;a href="<a href='http://origin-software.intel.com/en-us/blogs/2011/02/23/another-gaming-and-graphics-sample-coming-to-a-download-near-you/?cid=sw:graphics191%22&gt;http://origin-software.intel.com/en-us/blogs/2011/02/23/another-gaming-and-graphics-sample-coming-to-a-download-near-you/&lt;/a&gt;' class='bbc_url' title='External link' rel='nofollow external'>http://origin-software.intel.com/en-us/blogs/2011/02/23/another-gaming-and-graphics-sample-coming-to-a-download-near-you/?cid=sw:graphics191"&gt;http://origin-software.intel.com/en-us/blogs/2011/02/23/another-gaming-and-graphics-sample-coming-to-a-download-near-you/&lt;/a&gt;</a>]]></description>
		<pubDate>Mon, 21 Mar 2011 17:26:13 +0000</pubDate>
		<guid isPermaLink="false">31bb2feb402ac789507479daf9713b00</guid>
	</item>
	<item>
		<title>Real-Time Dynamic Fur on the GPU</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/real-time-dynamic-fur-on-the-gpu-r2774</link>
		<description><![CDATA[<br /><strong class='bbc'>Particle Systems</strong><br /> <a href='http://www.flickr.com/photos/gamedevnet/5081412465/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4027/5081412465_2eb4267def.jpg' alt='Posted Image' class='bbc_img' /></span></a>In computer graphics, particle systems represent an unstable side of nature: substances that donÆt have a surface and change rapidly over time, like <em class='bbc'>rain, fire, fog, sparks</em>. Every game engine has some kind of a sub-system for processing the particles, ideally exposing the full set of the functionality in the native particle editor. Until now, when <a href='http://en.wikipedia.org/wiki/OpenGL#OpenGL_3.0' class='bbc_url' title='External link' rel='nofollow external'>OpenGL 3</a> and <a href='http://gpgpu.org/' class='bbc_url' title='External link' rel='nofollow external'>GPGPU</a> tools gained wide support, particle engines were limited to 2<sup class='bbc'>10</sup>-2<sup class='bbc'>13</sup> particles being processed simultaneously.&nbsp;&nbsp;This article describes an advanced GPU-only particle engine structure. The technologies described are designed in terms of OpenGL core profile, but can be ported to Direct3D 10 as well. The reader should have a general understanding of particle systems, OpenGL-3 pipeline and modern high-level shading languages. <br /><br /><br /><strong class='bbc'>Traditional Approach</strong><br />&nbsp;&nbsp;struct Particle	{&nbsp;&nbsp;&nbsp;&nbsp;//attributes&nbsp;&nbsp;&nbsp;&nbsp;Vector3 pos, speed;&nbsp;&nbsp;&nbsp;&nbsp;float size; }; class ParticleSystem	{&nbsp;&nbsp;&nbsp;&nbsp;std::vector array;&nbsp;&nbsp;&nbsp;&nbsp;void init(unsigned num);&nbsp;&nbsp;&nbsp;&nbsp;void step(float delta); };&nbsp;&nbsp;The step() which updates the particle state including death and birth events is called all the time. The particle data is sent to GPU memory (represented by the buffer object) either by the direct data transfer call (glBuffer[Sub]Data(), glTexImage*) or implicitly by releasing of the locked memory (glUnmapBuffer) on every frame. <br /><strong class='bbc'>Point Sprites</strong><br /> The GPU is designed to process triangles, so each particle is generally drawn as a triangle (3 vertices) or a quad (2 triangles, 4 vertices). Point Sprite extension provides the functionality to generate the quad automatically on GPU from a given center coordinate and a size (glPointSize). When drawing points (GL_POINTS) in GL3+ context, it is enabled by default. An obvious advantage here is a great reduction in bandwidth load: sending 1 point data instead of 3-4 per particle. A downside is: the quad is generated in screen space and so can not be rotated and projected by the GL pipeline. Some particle systems work with symmetric forms (smoke, snow, fluids) and fit the point sprite model pretty well. In order to draw more complex primitives (like a rotated triangle), one can simulate a virtual viewport inside the point sprite quad area, implementing the rotation/projection transformations in the fragment shader and discarding unused area pixels. <br /><br /><br /><strong class='bbc'>Modern Approach</strong><br /> <a href='http://http.developer.nvidia.com/GPUGems3/elementLinks/36fig01.jpg' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://http.developer.nvidia.com/GPUGems3/elementLinks/36fig01.jpg' alt='Posted Image' class='bbc_img' /></span></a>Most particles donÆt require processing of self-interactions (rain, fire, fog). This makes the main particle processing loop iterations independent and thus easy to run in parallel. The GPU can handle such processing in a more efficient way compared to the CPU, as it was originally designed for parallel computations of vertices and fragments. One of the features that made its way into the GL 3.1 core profile is <strong class='bbc'>Transform Feedback</strong>. It allows one to turn off the graphics rasterizer (optionally) and write transformed vertex data back into buffer objects. By using this technology and matching the input/output format of the vertex shader one can process particle data entirely on GPU. It will be stored in buffer objects, available for drawing right away. The difference from the CPU approach here is the necessity in the separate output buffer, as itÆs not possible to transform the data in-place on the GPU. The sequence of operations goes as follows: <br /><br /><em class='bbc'>init A; A-&gt;B; B-&gt;A; ... draw A ... ; A-&gt;B; ...</em> <br /><br />It is even possible to initialize particle data state right on the GPU. There is a special shader value gl_VertexID that allows us to distinguish different particles in a shader (only the initial values are set, so no input data is given at this stage). <br /><br />It is reasonable to do particle processing and drawing separately with different frequencies. Thus the rasterizer is disabled when updating the data (GL_RASTERIZER_DISCARD). However, there is an option to combine these stages by enabling the rasterizer - and letting the transformed attributes to be drawn on the spot. This way is a bit faster, but much less scalable for the general purpose. It can be used for demos. <br /><br /><br /><strong class='bbc'>Implementation Details</strong><br /> <br /><strong class='bbc'>Birth and Death</strong><br /> It is impossible to allocate memory in the shader, so the chunk of memory which is initially allocated for the particle data storage should already contain all possible particles that can co-exist. The allocated number of particles = <em class='bbc'>capacity of the system</em> = maximum theoretical number of particles living at the same time. The boolean flag <em class='bbc'>isAlive</em> can be encoded into one of the particle attributes (like particle moment of birth, where value &lt;0 means that the particle is dead). The decisions of birth and death are made right inside the vertex shader that is responsible for particle update processing: <br /><br /> if(isAlive)&nbsp;&nbsp;&nbsp;&nbsp;//update() returns 0 for dead particles&nbsp;&nbsp;&nbsp;&nbsp;isAlive = update(); else if( born_ready() )&nbsp;&nbsp;&nbsp;&nbsp;//reset() gives initial state to a new-born&nbsp;&nbsp;&nbsp;&nbsp;isAlive = reset();&nbsp;&nbsp;<br /><strong class='bbc'>Managing Multiple Emitters</strong><br /> <a href='http://www.flickr.com/photos/gamedevnet/5081412487/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4070/5081412487_a687dd3c63_m.jpg' alt='Posted Image' class='bbc_img' /></span></a>If there are several emitters of the same particle format and capacity, they can share a single temporary buffer for data processing. It can be implemented by a <em class='bbc'>ParticleManager</em> class that contains the same data as each of the emitters E0,E1,E2,... The data flow is the following: <em class='bbc'>Ei -&gt; M (processing); Ei &lt;&gt; M (swap data)</em> <br /><br />Each emitter has the current copy of the particle data at any moment. <br /><br /><br /><strong class='bbc'>Conclusion</strong><br /> The proposed technology of particle systems processing provides a more efficient utilization of the GPU power, reduces the system bandwidth load and frees CPU memory for other tasks like physics and AI. <br /><strong class='bbc'>Simulating Fur Strands</strong><br /> <a href='http://www.flickr.com/photos/gamedevnet/5082005918/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4110/5082005918_3fa4ed6a1c.jpg' alt='Posted Image' class='bbc_img' /></span></a>In the previous page the technique of modeling particle systems on the GPU was covered. Now weÆll talk about an extension to that technique that allows modeling of fur strands. In addition to the previous pre-requisites, we expect the readerÆs understanding of Texture Buffer Objects, Render Target concept and spatial transformations, including quaternion operations. The <a href='http://www.blender.org/' class='bbc_url' title='External link' rel='nofollow external'>Blender</a> representation of the strand that needs to be simulated consists of the fixed number of joints, connected by straight lines. A brief description can be found in <a href='http://www.blender.org/development/release-logs/blender-246/particles/particle-system-and-types/' class='bbc_url' title='External link' rel='nofollow external'>Particle System</a> and <a href='http://www.blender.org/development/release-logs/blender-246/hair-and-fur/' class='bbc_url' title='External link' rel='nofollow external'>Fur Rendering</a>, but the best source of knowledge is Blender itself. This model was chosen in order to have a possibility to edit hair in Blender and then easily move it into an engine. The export script into the KRI engine format has support for particles and hair, including physics and rendering parameters. <br /><br />Definition: <em class='bbc'>fur layer - a set of points <em class='bbc'>joint (see picture) across all of the strands of a given surface. <br /><br /><br /><strong class='bbc'>Emit Sources</strong><br /> General particle data source is a mesh. Its geometry can be accessed directly from the shader that initializes the particle data on the birth event. This fact allows efficient emission of particles from the vertices or even the whole surface of a mesh, which can also be skinned and morphed at the same time. <br /><strong class='bbc'>Emit From Vertices</strong><br /> The way to the vertices lies through the <strong class='bbc'>Texture Buffer Object</strong> concept. It is required to bind the mesh data as TBO and sample from it in the particle processing shader, extracting the position and orientation (see Appendix A)&nbsp;&nbsp;of a randomly chosen vertex. Once the vertex is extracted, it is possible to apply the initial direction of a particle in the constructed basis to get the speed and copy the position.&nbsp;&nbsp;uniform samplerBuffer unit_vertex, unit_quat; int cord = int( random() * textureSize(unit_vertex) ); vec3 pos&nbsp;&nbsp;= texelFetch(unit_vertex, cord).xyz; vec4 quat = texelFetch(unit_quat, cord);&nbsp;&nbsp;<br /><strong class='bbc'>Emit From Faces</strong><br /> In order to access the mesh surface data in the random pattern, it is necessary to bake it first into a UV texture. For each texel in the UV space the position and orientation (as a quaternion) of the point in the world space are stored, writing into 2 Rgba8 textures. In the vertex shader, the position is constructed from the texture coordinate:&nbsp;&nbsp;in vec2 at_tex0;&nbsp;&nbsp;//texture coordinate gl_Position = vec4( 2.0*at_tex0 - vec2(1.0), 0.0,1.0 );&nbsp;&nbsp;When the textures are ready, a random sample is taken from them in the same way it was done for the vertices emitter case. <em class='bbc'>Note:</em> UV texture defines the amount and density of the grown fur, hence giving full control to the artist. <br /><br /><br /><strong class='bbc'>Fur Extension</strong><br /> Each fur layer is simulated as a separate particle emitter, all layers belong to the same particle manager. Capacity of that system equals the number of fur strands that need to be modelled. The strand joint particle is required to store at least <em class='bbc'>position</em> and <em class='bbc'>speed</em> attributes.&nbsp;&nbsp;There are 3 issues arising from this point of view: <br /><br /><br /><strong class='bbc'>Making a Layer Dependent on Previous Layers</strong><br /> Definition: <em class='bbc'>fake attribute</em> - a particle attribute whose value is not stored per particle instance but rather taken from some external source (e.g. another particle emitter). The previous layer position is required in order to determine the current strand direction. <br /><br />One layer before previous position is required for estimating the æstraightÆ direction and thus the current deviation. Both <em class='bbc'>Pos(L-1)</em> and <em class='bbc'>Pos(L-2)</em> are transmitted as <em class='bbc'>fake</em> attributes for processing of the current layer, where: <br /><br /><br /><br /><em class='bbc'>Pos(x)</em> = position of the strand joint of layer æ[i]x</em>Æ, <br />æ[i]L</em>æ = current layer id (&lt;i&gt;0]]></description>
		<pubDate>Thu, 14 Oct 2010 14:30:22 +0000</pubDate>
		<guid isPermaLink="false">d0aae9539e4dd0bd618e5d2598f18707</guid>
	</item>
	<item>
		<title>A Super Simple Method for Creating Infinite Scenery</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/a-super-simple-method-for-creating-infinite-sce-r2769</link>
		<description><![CDATA[My favorite kinds of games to play have always been realistic 3D vehicle simulations such as flight and ship simulations. So when I began game programming several years ago, I decided to come up with a way to create infinite scenery that was quick and easy to implement and did not require a lot of static geometry such as terrain data. Having such a method makes it easy to test camera and physics code because it does not require a terrain or water engine of any kind and you do not need to be worried about the size of the virtual world you will be testing in. Of course, the most common method is to simply create a very large rectangle to represent the ground and continually wrap the position of the vehicle to the opposite side every time it passes over one of the sides of the rectangle. This is infinite on the horizontal plane but when going vertically, eventually, the sides of the rectangle become visible because the rays of the camera frustum become long enough to go past the edges of the rectangle. So I came up with another method that was infinite in both the horizontal and vertical planes and requires no static geometry at all. In addition, it is very easy to implement and requires very little code and only simple math. The idea is to use the footprint of the camera, or in other words, the intersection of the camera’s frustum with the ground plane.<br /><br /> The simplest way to do this is to take our static ground rectangle and make it dynamic, resizing it as the camera moves or gets higher in altitude. But this means the rectangle will be much larger than it really needs to be most of the time. In this article, we will look at a more precise method that calculates the exact shape of the camera’s intersection with a flat ground plane preventing the need for view frustum culling or screen clipping.<br /><br /> <br /><strong class='bbc'>Finding the camera’s footprint</strong><br /> In order to use this method, we need to have a camera position and orientation, as well as the dimensions of the viewport and the field-of-view angle.<br /><br /> The first step in the process is to calculate a rectangle that sits in front of the camera, is oriented according to the current orientation of the camera, and has the same width and height as the viewport.<br /><br /> Fig. 1 demonstrates this rectangle.<br /><br /> <a href='http://www.flickr.com/photos/gamedevnet/4993629366/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4084/4993629366_328e040434_z.jpg' alt='Posted Image' class='bbc_img' /></span></a> In order to calculate the corner points of the rectangle, we use the “side” and “up” unit vectors of the camera. In the <a href='http://downloads.gamedev.net/features/programming/SimpleInfScenery/vi_source_code.zip' class='bbc_url' title='External link' rel='nofollow external'>demo</a>, a camera class is used that stores these vectors but they can also be pulled out of the camera matrix if that is how the camera is represented instead. To get the y coordinate of each corner point we multiply the “up” vector by half the height of the viewport and to get the x coordinate we multiply the “side” vector by half the width of the viewport. In order to calculate the z coordinate we must know the “view distance”, or the distance along the “lookat” vector at which the frustum has the same dimensions as the viewport. Looking again at Fig. 1, we see that the rectangle that we are trying to create represents a single slice of the camera frustum. This slice can be made at any distance from the camera eye point, but at only one distance will the slice have the same width and height as the viewport. This distance is the “view distance”.<br /><br /> In order to calculate the “view distance” we use a simple formula based on the tangent of the field-of-view angle. In Fig.2 we have defined a triangle, shown in green, where the base, b, has the same length as the “view distance”. The height, h, is equal to half the viewport width and is parallel to the “side” vector. Since b bisects the field-of-view angle, then theta is equal to half the field-of-view angle.<br /><br /> Since we know that:<br /><br /> tan( theta ) = h / length of b<br /><br /> then if we solve for length of b we have<br /><br /> length of b = h / tan( theta )<br /><br /> So “view distance” is equal to the length of b, or half viewport width / half fov angle.<br /><br /> <a href='http://www.flickr.com/photos/gamedevnet/4993023173/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4129/4993023173_a129d826c1_z.jpg' alt='Posted Image' class='bbc_img' /></span></a> In the <a href='http://downloads.gamedev.net/features/programming/SimpleInfScenery/vi_source_code.zip' class='bbc_url' title='External link' rel='nofollow external'>demo source code</a>, the “drawHorizon” method of the RendererBase class contains the code for implementing these calculations:<br /><br />&nbsp;&nbsp;void RendererBase::drawHorizon( QCamera &qcamera, const ViewPort &viewport ) {&nbsp;&nbsp;&nbsp;&nbsp;D3DXVECTOR3 lookAt = qcamera.lookAt();&nbsp;&nbsp;&nbsp;&nbsp;D3DXVECTOR3 side = qcamera.side();&nbsp;&nbsp;&nbsp;&nbsp;D3DXVECTOR3 up = qcamera.up();&nbsp;&nbsp;&nbsp;&nbsp;D3DXVECTOR3 eye = qcamera.eye();&nbsp;&nbsp;&nbsp;&nbsp;std::vector clippedPts, pts;&nbsp;&nbsp;&nbsp;&nbsp;pts.reserve(4);&nbsp;&nbsp;&nbsp;&nbsp;D3DXVECTOR3 viewC = lookAt * viewport.viewDistance();&nbsp;&nbsp;&nbsp;&nbsp;float scrHalfW = viewport.screenWidth() * 0.5f;&nbsp;&nbsp;&nbsp;&nbsp;float scrHalfH = viewport.screenHeight() * 0.5f;&nbsp;&nbsp;&nbsp;&nbsp;// UL&nbsp;&nbsp;&nbsp;&nbsp;pts.push_back((up * scrHalfH) + side * (-scrHalfW) + viewC); 	// UR&nbsp;&nbsp;&nbsp;&nbsp;pts.push_back((up * scrHalfH) + (side * scrHalfW) + viewC);&nbsp;&nbsp;&nbsp;&nbsp;//LR&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pts.push_back((up * -(scrHalfH)) + (side * scrHalfW) + viewC);&nbsp;&nbsp;&nbsp;&nbsp;//LL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pts.push_back((up * -(scrHalfH)) + (side * (-scrHalfW)) + viewC);&nbsp;&nbsp;So now we will have four points contained in a vector and which represent the corners of our rectangle. The next step is to clip this rectangle below a horizontal plane that is parallel to the ground plane (i.e. all points have the same y-coord) and is slightly below the camera eye point. Why does it need to be below the camera eye and not exactly at the same height? This will be explained later but for now let us look at the clipping process. We will use the Cohen-Sutherland algorithm [LaMothe2003] , but pretty much any clipping algorithm will do. The basic idea is to look at each edge in the rectangle one-by-one and check to see if one of three cases is true: both end points are above the plane, both are below the plane, or one is above and the other is below. If both are above, we remove the edge from the list. If both are below we keep it in the list. If one is above, then we move this point down to the clipping plane and keep the other point the same. The rest of the “drawHorizon” method shows how this is done in code:&nbsp;&nbsp;int v1,v2, cp = 0; float x1,y1,z1,x2,y2,z2,newx,newz,m; float ymax = -1.0f; v1=3; for (v2=0; v2 &lt; 4; v2++) {&nbsp;&nbsp;&nbsp;&nbsp;x1 = pts[v1].x;&nbsp;&nbsp;&nbsp;&nbsp;y1 = pts[v1].y;&nbsp;&nbsp;&nbsp;&nbsp;z1 = pts[v1].z;&nbsp;&nbsp;&nbsp;&nbsp;x2 = pts[v2].x;&nbsp;&nbsp;&nbsp;&nbsp;y2 = pts[v2].y;&nbsp;&nbsp;&nbsp;&nbsp;z2 = pts[v2].z;&nbsp;&nbsp;&nbsp;&nbsp;if ((y1 &lt;= ymax) && (y2 &lt;= ymax))&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	clippedPts.push_back( pts[v2] );&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;else if ((y1 &gt; ymax) && (y2 &gt; ymax))&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	// completely above&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;if ((y1 &lt;= ymax) && (y2 &gt; ymax))&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (x1 != x2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m= (y2-y1) / (x2-x1);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newx = x1 + ((ymax - y1) / m);&nbsp;&nbsp; 	}&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newx = x1;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (z1 != z2)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m= (y2-y1) / (z2-z1);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newz = z1 + ((ymax - y1) / m);&nbsp;&nbsp; 	}&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newz = z1;&nbsp;&nbsp; 	clippedPts.push_back( D3DXVECTOR3(newx,ymax,newz) );&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;if ((y1 &gt; ymax) && (y2 &lt;= ymax)) 	{&nbsp;&nbsp; 	if (x1 != x2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m= (y2-y1) / (x2-x1);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newx = x1 + ((ymax - y1) / m);&nbsp;&nbsp; 	}&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newx = x1;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (z1 != z2)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m= (y2-y1) / (z2-z1);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newz = z1 + ((ymax - y1) / m);&nbsp;&nbsp; 	}&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;newz = z1;&nbsp;&nbsp; 	clippedPts.push_back( D3DXVECTOR3(newx,ymax,newz) );&nbsp;&nbsp; 	clippedPts.push_back( pts[v2] );&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;v1=v2; } cp = clippedPts.size(); if (cp == 0) {&nbsp;&nbsp;&nbsp;&nbsp;return; }&nbsp;&nbsp;We now have a vector containing a set of points which represent our clipped rectangle. This vector can contain anywhere from zero to five points depending on how the camera is oriented. Fig. 3 shows a diagram of this rectangle. <a href='http://www.flickr.com/photos/gamedevnet/4993023187/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4130/4993023187_2be52723fb_z.jpg' alt='Posted Image' class='bbc_img' /></span></a> The green area is our clipped rectangle, which represents the horizon image. The blue horizontal line represents our clipping plane. You can now see that if we draw lines from the camera eye to each of the rectangle corners, and then extend these lines until they intersect the ground plane, we will end up with a set of points which represent the camera’s footprint on the ground. This is why we needed to make the clipping plane slightly below our camera, because if it were at the same height, the top two lines would be parallel to the ground and would extend forever, never intersecting the ground. In the clipping code above, “yMax” represents the distance between the camera eye point and the clipping plane. In the demo, this is set to -1.0 but it can be any number less than zero. It is just a matter of visual preference, the further below the camera the clipping plane is, the further down in the viewport the horizon image will appear. Another option would be to define an epsilon value in our clipping code that would represent the minimum y-distance any point can be from the clipping plane. Then we would no longer need to move the clipping plane below the camera.<br /><br /> The code for extending the edges and converting into world space is shown below:<br /><br />&nbsp;&nbsp;for (unsigned int i=0; i &lt; cp; i++) {&nbsp;&nbsp;&nbsp;&nbsp;D3DXVec3Normalize( &footPrintVertex, &clippedPts[i] );&nbsp;&nbsp;&nbsp;&nbsp;scale = fabs(footPrintVertex.y == 0 ? 0 : eye.y / footPrintVertex.y);&nbsp;&nbsp;&nbsp;&nbsp;footPrintVertex *= scale;&nbsp;&nbsp;&nbsp;&nbsp;footPrintVertex.x += eye.x;&nbsp;&nbsp;&nbsp;&nbsp;footPrintVertex.y += eye.y;&nbsp;&nbsp;&nbsp;&nbsp;footPrintVertex.z += eye.z;&nbsp;&nbsp;&nbsp;&nbsp;….&nbsp;&nbsp;&nbsp;&nbsp;…. }&nbsp;&nbsp;We simply go through each point in our vector of clipped points and normalize them. Then we scale each one by the height of the camera. Then we add the position of the camera eye and we now have our camera footprint in world space coordinates! Now all we have to do is render them as a triangle fan. <br /><strong class='bbc'>Rendering the camera’s footprint</strong><br /> In order to render our camera footprint, we need to consider what kind of projection we are using in our transformation pipeline.<br /><br /> If we are using a perspective projection, it is quite simple to render the footprint since it is already in world space coordinates. We simply pass these points into the API as a triangle fan and voila, we now have a horizon on screen. Of course, we should probably add textures to the horizon so we can have some ground detail which will give us a sense of altitude. This is also quite easy since all we have to do is scale the x and z coordinates of our points and we will get our tiled texture coordinates. This is shown in the “renderCameraFootprint” method of the PerspectiveRenderer class, included in <a href='http://downloads.gamedev.net/features/programming/SimpleInfScenery/vi_source_code.zip' class='bbc_url' title='External link' rel='nofollow external'>the demo</a>:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;geometryVertices[i].position = footPrintVertex; geometryVertices[i].tu = footPrintVertex.x / 200.0f; geometryVertices[i].tv = footPrintVertex.z / 200.0f; geometryVertices[i].ts = footPrintVertex.x / 15000.0f; geometryVertices[i].tt = footPrintVertex.z / 15000.0f;&nbsp;&nbsp;The perspective render version can be seen in the viPerspective.exe application. We will use two textures to render the ground plane. This is because the higher you go, the smaller the texture tiles become until they are almost not visible. The second texture will be blended with the first and will have a larger tile size so it will become more visible as we go higher, providing new detail at higher altitudes. Now, if we are working in an environment where we will not be using any kind of a projection matrix, we will have to do a little more work since our footprint is in three-dimensional world coordinates and we need it to be in two-dimensional screen coordinates. Fortunately, the calculations are pretty easy, we just add perspective to our coordinates by dividing x and y by z, and then we scale by the view distance to convert them to viewport space. Since we are using a right-handed coordinate system in our demo, we use the negative view distance so that when the camera is pointing in the negative z direction the x coordinate will go from negative to positive when going from the left side of the viewport to the right side. Here is the code from the “renderCameraFootPrint” method of the OrthographicRenderer class from <a href='http://downloads.gamedev.net/features/programming/SimpleInfScenery/vi_source_code.zip' class='bbc_url' title='External link' rel='nofollow external'>the demo</a>:<br /><br />&nbsp;&nbsp;geometryVertices[i].x = ( (geometryVertices[i].x / geometryVertices[i].z) * -viewDist ); geometryVertices[i].y = ( (geometryVertices[i].y / geometryVertices[i].z) * -viewDist ); geometryVertices[i].z = 0.0f;&nbsp;&nbsp;The main drawback is that we must also process our texture coordinates further. This will vary according to the platform and API that you are programming on. On the Playstation 2 it is relatively easy, we just calculate homogeneous texture coordinates s,t,q, like so: <p class='bbc_indent' style='margin-left: 40px;'>s = geometryVertices[I].tu / footPrintVertex.z;<br />t = geometryVertices[I].tv / footPrintVertex.z;<br />q = 1.0f / footPrintVertex.z;</p> We then specify that perspective-correct texturing is enabled when rendering. I used this technique to create a skyplane cloud layer for a Playstation 2 flight simulator demo. Here is a screenshot:<br /><br /> <a href='http://www.flickr.com/photos/gamedevnet/4993023211/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4147/4993023211_4c9be51307_o.png' alt='Posted Image' class='bbc_img' /></span></a><br /><a href='http://www.youtube.com/watch?v=15uDzgg8aQc' class='bbc_url' title='External link' rel='nofollow external'>YouTube Video Link</a> Since the purpose of this article is to demonstrate the theory behind the method instead of any particular implementation details for specific graphics API’s, we will not go into how to do this on other platforms besides the PS2. In Direct3D and OpenGL it is probably best to stick to perspective projection since they will calculate the projection matrix for you and handle the texture coordinates as is.<br /><br /> But even without texturing, this technique can still be useful for creating an artificial horizon, such as on the instrument panel on airplanes, or for rendering the HUD of jet fighters. It is demonstrated in the viOrthographic.exe demo which uses an orthographic projection instead of a perspective projection since we do not need our perspective to be calculated by the API.<br /><br /> <br /><strong class='bbc'>Adding Terrain</strong><br /> So great, we can now create a pretty nice looking horizon no matter where the camera is located or how it is oriented. However, so far we have only been able to render completely flat ground. What if we want to have some hills and mountains? Well, since we know what our footprint is on the two-dimensional ground plane, if we divide up the plane into a regular grid of equally sized rectangular “cells”, then all we have to do is figure out which cells overlap our footprint polygon and draw those cells using a heightmap to create three dimensional triangles and thus our terrain.<br /><br /> So how do we figure out which cells are overlapping? Well, if we think of our cells as pixels on a two-dimensional drawing surface, and our footprint polygon as a graphics primitive, then we can see how the scan conversion algorithm can be used, since it is the most common way to render graphics primitives onto a drawing surface.<br /><br /> The idea is that we create scan lines that have the same size as our grid cells and we scan along the line from one edge of the polygon that intersects the scan line to the next edge that intersects the scan line. All grid cells on the same scan line that lie between the two edges will be found by the scan.<br /><br /> <a href='http://www.flickr.com/photos/gamedevnet/4993629446/' class='bbc_url' title='External link' rel='nofollow external'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4086/4993629446_510ff4118f_z.jpg' alt='Posted Image' class='bbc_img' /></span></a> If we look at Fig. 4, we can see how this technique works. On the first scan we start at cells that have a z-coordinate of -4. We then scan from left to right and find that only one grid cell intersects, cell (1, -4). We then increment our scan line to z-coordinate –3, and scan again starting at edge A and going to edge B. This time we find grid cells (0,-3) to (2,-3). We continue the process until we have scanned all of the edges. You can see how this is done in the <a href='http://downloads.gamedev.net/features/programming/SimpleInfScenery/vi_source_code.zip' class='bbc_url' title='External link' rel='nofollow external'>demo source</a> in the files ScanConvert.h and ScanConverter.cpp. There are many books and resources that explain scan conversion, especially ones that deal with software rasterization where scan conversion is quite often used.<br /><br /> The one drawback to this technique is that as the camera gets higher and higher, the footprint gets very large and thus requires a lot of grid cells to cover it. Eventually, the frame rate will drop off by quite a bit. In the demo, this problem is alleviated by increasing the size of the grid cells as the camera goes up. So when it reaches 20 units in altitude, the cell size is doubled to 2000 units, and when it reaches 40, it is quadrupled to 4000, and so on. The heightmap tile size however remains the same so that the terrain does not get wider, it just gets courser with less detail.<br /><br /> The viTerrain.exe demo demonstrates this technique in action.<br /><br /> As you can see there is quite a lot of popping so this method of optimization is only useful for demonstration purposes but it does help to improve the performance of the demo so that we can really get a good idea of how the scan conversion works. If you really want to get the best performance when using this technique with terrain, you should certainly use one of the common terrain LOD techniques to render each grid cell, such as ROAM, geomorphing, geomipmapping, etc. That is left as an exercise for the reader.<br /><br /> Having the camera’s world-space footprint can also be helpful when you do not need infinite terrain, but instead have a fixed set of geometry to represent your scenery. An example would be when you have several quadtrees to render terrain at different locations in world space. The camera footprint can be used in view-culling to determine if any of the geometry is visible from the camera. You simply need to perform 2D intersection tests between the footprint polygon and the geometry bounding boxes.<br /><br /> <br /><strong class='bbc'>Conclusion</strong><br /> So there you have it, infinite scenery in both the horizontal and vertical dimensions with an extremely precise camera footprint that can be easily rendered in a very efficient manner.<br /><br /> Happy Coding,<br />Gabriel T. Delarosa<br />July 3, 2010<br /><br /> <br /><strong class='bbc'>Bibliography:</strong><br /> Lamothe, Andre, “Tricks of the 3D Game Programming Gurus“ (Sams Publishing, 2003, ISBN: 0-672-31835-0 )<br /><br />]]></description>
		<pubDate>Wed, 15 Sep 2010 11:53:45 +0000</pubDate>
		<guid isPermaLink="false">2559854e14663053f02bdfb2c3066c1d</guid>
	</item>
	<item>
		<title>A Simple and Practical Approach to SSAO</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/a-simple-and-practical-approach-to-ssao-r2753</link>
		<description><![CDATA[Global illumination (GI) is a term used in computer graphics to refer to all lighting phenomena caused by interaction between surfaces (light rebounding off them, refracting, or getting blocked), for example: color bleeding, caustics, and shadows. Many times the term GI is used to refer only to color bleeding and realistic ambient lighting. Direct illumination – light that comes directly from a light source – is easily computed in real-time with today´s hardware, but we can´t say the same about GI because we need to gather information about nearby surfaces for every surface in the scene and the complexity of this quickly gets out of control. However, there are some approximations to GI that are easier to manage. When light travels through a scene, rebounding off surfaces, there are some places that have a smaller chance of getting hit with light: corners, tight gaps between objects, creases, etc. This results in those areas being darker than their surroundings.&nbsp;&nbsp;<br /><br />This effect is called ambient occlusion (AO), and the usual method to simulate this darkening of certain areas of the scene involves testing, for each surface, how much it is “occluded” or “blocked from light” by other surfaces. Calculating this is faster than trying to account for all global lighting effects, but most existing AO algorithms still can’t run in real-time.<br /><br />Real-time AO was out of the reach until Screen Space Ambient Occlusion (SSAO) appeared. SSAO is a method to approximate ambient occlusion in screen space. It was first used in games&nbsp;&nbsp;by Crytek, in their “Crysis” franchise and has been used in many other games since. In this article I will explain a simple and concise SSAO method that achieves better quality than the traditional implementation.<br /> <br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639143267/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4006/4639143267_9a3ba682db.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>The SSAO in Crysis</em>&nbsp;&nbsp;</p><br /><h2>Prerequisites</h2><br />The original implementation by Crytek had a depth buffer as input and worked roughly like this: for each pixel in the depth buffer, sample a few points in 3D around it, project them back to screen space and compare the depth of the sample and the depth at that position in the depth buffer to determine if the sample is in front (no occlusion) or behind a surface (it hits an occluding object). An occlusion buffer is generated by averaging the distances of occluded samples to the depth buffer. However this approach has some problems (such as self occlusion, haloing) that I will illustrate later.<br /><br />The algorithm I describe here does all calculations in 2D, no projection is needed. It uses per-pixel position and normal buffers, so if you´re using a deferred renderer you have half of the work done already. If you´re not, you can try to reconstruct position from depth or you can store per-pixel position directly in a floating point buffer. I recommend the later if this is your first time implementing SSAO as I will not discuss position reconstruction from depth here. Either way, for the rest of the article I´ll assume you have both buffers available. Positions and normals need to be in view space.<br /><br />What we are going to do in this article is exactly this: <strong class='bbc'>take the position and normal buffer, and generate a one-component-per-pixel occlusion buffer</strong>. How to use this occlusion information is up to you; the usual way is to subtract it from the ambient lighting in your scene, but you can also use it in more convoluted or strange ways for NPR (non-photorealistic) rendering if you wish.<br /><br /><h2>Algorithm</h2><br />Given any pixel in the scene, it is possible to calculate its ambient occlusion by treating all neighboring pixels as small spheres, and adding together their contributions. To simplify things, we will work with points instead of spheres: <strong class='bbc'>occluders will be just points with no orientation and the occludee (the pixel which receives occlusion) will be a&nbsp;&nbsp;pair</strong>. Then, the occlusion contribution of each occluder depends on two factors:&nbsp;&nbsp;<ul class='bbc'><li>Distance “d” to the occludee.</li><li>Angle between the occludee´s normal “N” and the vector between occluder and occludee “V”.</li></ul>With these two factors in mind, a simple formula to calculate occlusion is: <strong class='bbc'>Occlusion = max( 0.0, dot( N, V) ) * ( 1.0 / ( 1.0 + d ) )</strong><br /><br />The first term, max( 0.0, dot( N,V ) ), works based on the intuitive idea that points directly above the occludee contribute more than points near it but not quite right on top. The purpose of the second term ( 1.0 / ( 1.0 + d ) ) is to attenuate the effect linearly with distance. You could choose to use quadratic attenuation or any other function, it´s just a matter of taste.<br /> <br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639752338/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4026/4639752338_7a574740e9.jpg' alt='Posted Image' class='bbc_img' /></span></span></a>&nbsp;&nbsp;</p>The algorithm is very easy: sample a few neighbors around the current pixel and accumulate their occlusion contribution using the formula above. To gather occlusion, I use 4 samples (&lt;1,0&gt;,&lt;-1,0&gt;,&lt;0,1&gt;,&lt;0,-1&gt;) rotated at 45º and 90º, and reflected using a random normal texture.<br /><br />Some tricks can be applied to accelerate the calculations: you can use half-sized position and normal buffers, or you can also apply a bilateral blur to the resulting SSAO buffer to hide sampling artifacts if you wish. Note that these two techniques can be applied to any SSAO algorithm.<br /><br />This is the HLSL pixel shader code for the effect that has to be applied to a full screen quad:<br /><br /><pre class='prettyprint lang-auto linenums:0'>sampler g_buffer_norm;
sampler g_buffer_pos;
sampler g_random;
float random_size;
float g_sample_rad;
float g_intensity;
float g_scale;
float g_bias;

struct PS_INPUT
{
float2 uv : TEXCOORD0;
};

struct PS_OUTPUT
{
float4 color : COLOR0;
};

float3 getPosition(in float2 uv)
{
return tex2D(g_buffer_pos,uv).xyz;
}

float3 getNormal(in float2 uv)
{
return normalize(tex2D(g_buffer_norm, uv).xyz * 2.0f - 1.0f);
}

float2 getRandom(in float2 uv)
{
return normalize(tex2D(g_random, g_screen_size * uv / random_size).xy * 2.0f - 1.0f);
}

float doAmbientOcclusion(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm)
{
float3 diff = getPosition(tcoord + uv) - p;
const float3 v = normalize(diff);
const float d = length(diff)*g_scale;
return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d))*g_intensity;
}

PS_OUTPUT main(PS_INPUT i)
{
PS_OUTPUT o = (PS_OUTPUT)0;

o.color.rgb = 1.0f;
const float2 vec[4] = {float2(1,0),float2(-1,0),
			float2(0,1),float2(0,-1)};

float3 p = getPosition(i.uv);
float3 n = getNormal(i.uv);
float2 rand = getRandom(i.uv);

float ao = 0.0f;
float rad = g_sample_rad/p.z;

//**SSAO Calculation**//
int iterations = 4;
for (int j = 0; j &lt; iterations; ++j)
{
&nbsp;&nbsp;float2 coord1 = reflect(vec[j],rand)*rad;
&nbsp;&nbsp;float2 coord2 = float2(coord1.x*0.707 - coord1.y*0.707,
			&nbsp;&nbsp;coord1.x*0.707 + coord1.y*0.707);
&nbsp;&nbsp;
&nbsp;&nbsp;ao += doAmbientOcclusion(i.uv,coord1*0.25, p, n);
&nbsp;&nbsp;ao += doAmbientOcclusion(i.uv,coord2*0.5, p, n);
&nbsp;&nbsp;ao += doAmbientOcclusion(i.uv,coord1*0.75, p, n);
&nbsp;&nbsp;ao += doAmbientOcclusion(i.uv,coord2, p, n);
}
ao/=(float)iterations*4.0;
//**END**//

//Do stuff here with your occlusion value Ã¢aoÃ¢: modulate ambient lighting, write it to a buffer for later //use, etc.
return o;
}</pre><br />The concept is very similar to the image space approach presented in “Hardware Accelerated Ambient Occlusion Techniques on GPUs” [1] the main differences being the sampling pattern and the AO function. It can also be understood as an image-space version of “Dynamic Ambient Occlusion and Indirect Lighting” [2] Some details worth mentioning about the code:&nbsp;&nbsp;<ul class='bbc'><li>The radius is divided by p.z, to scale it depending on the distance to the camera. If you bypass this division, all pixels on screen will use the same sampling radius, and the output will lose the perspective illusion.</li><li>During the for loop, coord1 are the original sampling coordinates, at 90º. coord2 are the same coordinates, rotated 45º.</li><li>The random texture contains randomized normal vectors, so it is your average normal map. This is the random normal texture I use:<br />
<br />
<a href='http://www.flickr.com/photos/gamedevnet/4639143323/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4003/4639143323_c6bb4a75e3_t.jpg' alt='Posted Image' class='bbc_img' /></span></span></a>&nbsp;&nbsp;<br />
<br />
It is tiled across the screen and then sampled for each pixel, using these texture coordinates:<br />
<br />
<span style='font-size: 12px;'><span style='font-family: Courier New'>g_screen_size * uv / random_size </span></span><br />
<br />
Where “g_screen_size” contains the width and height of the screen in pixels and “random_size” is the size of the random texture (the one I use is 64x64). The normal you obtain by sampling the texture is then used to reflect the sampling vector inside the for loop, thus getting a different sampling pattern for each pixel on the screen. (check out “interleaved sampling” in the references section)</li></ul>At the end, the shader reduces to iterating through some occluders, invoking our AO function for each of them and accumulating the results. There are four artist variables in it:&nbsp;&nbsp;<ul class='bbc'><li>g_scale: scales distance between occluders and occludee.</li><li>g_bias: controls the width of the occlusion cone considered by the occludee.</li><li>g_sample_rad: the sampling radius.</li><li>g_intensity: the ao intensity.</li></ul>Once you tweak the values a bit and see how the AO reacts to them, it becomes very intuitive to achieve the effect you want.&nbsp;&nbsp;<br /><br /><h2>Results</h2> <br /><p class='bbc_center'>&nbsp;&nbsp;<a href='http://www.flickr.com/photos/gamedevnet/4639143365/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4045/4639143365_eb4136e969.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>a) raw output, 1 pass 16 samples b] raw output, 1 pass 8 samples c) directional light only d) directional light – ao, 2 passes 16 samples each.</em></p><br /><p class='bbc_left'>As you can see, the code is short and simple, and the results show no self occlusion and very little to no haloing. These are the two main problems of all the SSAO algorithms that use only the depth buffer as input, you can see them in these images:</p><br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639143389/'>♦<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4054/4639143389_42b13c5ef6.jpg' alt='Posted Image' class='bbc_img' /></span></span></a> 	<a href='http://www.flickr.com/photos/gamedevnet/4639143415/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4030/4639143415_444cde1085.jpg' alt='Posted Image' class='bbc_img' /></span></span></a></p>The self-occlusion appears because the traditional algorithm samples inside a sphere around each pixel, so in non-occluded planar surfaces at least half of the samples are marked as ‘occluded’. This yields a grayish color to the overall occlusion. Haloing causes soft white edges around objects, because in these areas self-occlusion does not take place. So getting rid of self-occlusion actually helps a lot hiding the halos.<br /><br />The resulting occlusion from this method is also surprisingly consistent when moving the camera around. If you go for quality instead of speed, it is possible to use two or more passes of the algorithm (duplicate the for loop in the code) with different radiuses, one for capturing more global AO and other to bring out small crevices. With lighting and/or textures applied, the sampling artifacts are less apparent and because of this, usually you should not need an extra blurring pass.<br /><br /><h2>Taking it further</h2><br />I have described a down-to-earth, simple SSAO implementation that suits games very well. However, it is easy to extend it to take into account hidden surfaces that face away from the camera, obtaining better quality. Usually this would require three buffers: two position/depth buffers (front/back faces) and one normal buffer.&nbsp;&nbsp;But you can do it with only two buffers: store depth of front faces and back faces in red and green channels of a buffer respectively, then reconstruct position from each one. This way you have one buffer for positions and a second buffer for normal.&nbsp;&nbsp;<br /><br />These are the results when taking 16 samples for each position buffer:&nbsp;&nbsp;<br /> <br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639752478/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4004/4639752478_0645735a87.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>left: front faces occlusion, right: back faces occlusion</em>&nbsp;&nbsp;</p><br />To implement it just and extra calls to “doAmbientOcclusion()” inside the sampling loop that sample the back faces position buffer when searching for occluders. As you can see, the back faces contribute very little and they require doubling the number of samples, almost doubling the render time. You could of course take fewer samples for back faces, but it is still not very practical.<br /><br />This is the extra code that needs to be added:<br /><br />inside the for loop, add these calls:<br /><br /><pre class='prettyprint lang-auto linenums:0'>ao += doAmbientOcclusionBack(i.uv,coord1*(0.25+0.125), p, n);
ao += doAmbientOcclusionBack(i.uv,coord2*(0.5+0.125), p, n);
ao += doAmbientOcclusionBack(i.uv,coord1*(0.75+0.125), p, n);
ao += doAmbientOcclusionBack(i.uv,coord2*1.125, p, n);</pre><br />Add these two functions to the shader:<br /><br /><pre class='prettyprint lang-auto linenums:0'>float3 getPositionBack(in float2 uv)
{
return tex2D(g_buffer_posb,uv).xyz;
}
float doAmbientOcclusionBack(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm)
{
float3 diff = getPositionBack(tcoord + uv) - p;
const float3 v = normalize(diff);
const float d = length(diff)*g_scale;
return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d));
}</pre><br />Add a sampler named “g_buffer_posb” containing the position of back&nbsp;&nbsp;faces. (draw the scene with front face culling enabled to generate it)&nbsp;&nbsp;Another small change that can be made, this time to improve speed&nbsp;&nbsp;instead of quality, is adding a simple LOD (level of detail) system to&nbsp;&nbsp;our shader. Change the fixed amount of iterations with this:&nbsp;&nbsp;<br /><br /><span style='font-size: 12px;'><span style='font-family: Courier New'>int iterations = lerp(6.0,2.0,p.z/g_far_clip); </span></span><br /><br />The variable&nbsp;&nbsp;“g_far_clip” is the distance of the far clipping plane, which must be passed to the shader. Now the amount of iterations applied to each pixel depends on distance to the camera. Thus, distant pixels perform a coarser sampling, improving performance with no noticeable quality loss. I´ve not used this in the performance measurements (below), however.<br /><br /><h2>Conclusion and Performance Measurements</h2><br />As I said at the beginning of the article, this method is very well suited for games using deferred lighting pipelines because it requires two buffers that are usually already available. It is straightforward to implement, and the quality is very good. It solves the self-occlusion issue and reduces haloing, but apart from that it has the same limitations as other screen-space ambient occlusion techniques:&nbsp;&nbsp;Disadvantages:<ul class='bbc'><li>Does not take into account hidden geometry (especially geometry outside the frustum).</li><li>The performance is very dependent on sampling radius and distance to the camera, since objects near the front plane of the frustum will use bigger radiuses than those far away.</li><li>The output is noisy.</li></ul>Speed wise, it is roughly equal to a 4x4 Gaussian blur for a 16 sample implementation, since it samples only 1 texture per sample and the AO function is really simple, but in practice it is a bit slower. Here´s a table showing the measured speed in a scene with the Hebe model at 900x650 with no blur applied on a Nvidia 8800GT:<br /> <br /><p style='text-align:center'><table style="width:50%;"><tr><td><strong>Settings</strong></td><td><strong>FPS</strong></td><td><strong>SSAO time (ms)</strong></td></tr><tr><td>High (32 samples front/back)</td><td>150</td><td>3.3</td></tr><tr><td>Medium (16 samples front)</td><td>290</td><td>0.27</td></tr><tr><td>Low (8 samples front)</td><td>310</td><td>0.08</td></tr></table></p><br /><p class='bbc_left'>In these last screenshots you can see how this algorithm looks when&nbsp;&nbsp;applied to different models. At highest quality (32 samples front and&nbsp;&nbsp;back faces, very big radius, 3x3 bilateral blur):&nbsp;&nbsp;</p><br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639752508/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm4.static.flickr.com/3043/4639752508_642aafb156.jpg' alt='Posted Image' class='bbc_img' /></span></span></a></p><br /><p class='bbc_left'>At lowest quality (8 samples front faces only, no blur, small radius):</p><br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639143469/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4060/4639143469_479dd85cb2.jpg' alt='Posted Image' class='bbc_img' /></span></span></a></p><br />It is also useful to consider how this technique compares to ray-traced AO. The purpose of this comparison is to see if the method would converge to real AO when using enough samples.<br /> <br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639143501/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm4.static.flickr.com/3386/4639143501_af7880788e.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>Left: the SSAO presented here, 48 samples per pixel (32 for front faces and 16 for back faces), no blur. Right: Ray traced AO in Mental Ray. 32 samples, spread = 2.0, maxdistance = 1.0; falloff = 1.0.</em>&nbsp;&nbsp;</p><br />One last word of advice: don´t expect to plug the shader into your pipeline and get a realistic look automatically. Despite this implementation having a good performance/quality ratio, SSAO is a time consuming effect and you should tweak it carefully to suit your needs and obtain the best performance possible. Add or remove samples, add a bilateral blur on top, change intensity, etc. You should also consider if SSAO is the way to go for you. Unless you have lots of dynamic objects in your scene, you should not need SSAO at all; maybe light maps are enough for your purpose as they can provide better quality for static scenes.<br /><br />I hope you will benefit in some way from this method. All code included in this article is made available under the <a href='http://www.opensource.org/licenses/mit-license.php' class='bbc_url' title='External link' rel='nofollow external'>MIT license</a>&nbsp;&nbsp;<br /><br /><span style='font-size: 12px;'><strong class='bbc'>References</strong></span><br /><br />[1] Hardware Accelerated Ambient Occlusion Techniques on GPUs<br />(Perumaal Shanmugam)&nbsp;&nbsp;[2] Dynamic Ambient Occlusion and Indirect Lighting<br />(Michael Bunnell)&nbsp;&nbsp;<br /><br />[3] Image-Based Proxy Accumulation for Real-Time Soft Global Illumination<br />(Peter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder)&nbsp;&nbsp;<br /><br />[4] Interleaved Sampling<br />(Alexander Keller, Wolfgang Heidrich)&nbsp;&nbsp;<br /> <br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639143627/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4060/4639143627_b4bba7bbee.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>Crytek´s Sponza rendered at 1024x768, 175 fps with a directional light.</em></p><br /><p class='bbc_center'><a href='http://www.flickr.com/photos/gamedevnet/4639752926/'><span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://farm5.static.flickr.com/4047/4639752926_1741d420fe.jpg' alt='Posted Image' class='bbc_img' /></span></span></a><br /><em class='bbc'>The same scene rendered at 1024x768, 110 fps using SSAO medium settings: 16 samples, front faces, no blur. Ambient lighting has been multiplied by (1.0-AO). </em></p><br />The Sponza model was downloaded from <a href='http://www.crytek.com/downloads/technology/' class='bbc_url' title='External link' rel='nofollow external'>Crytek's website.</a>]]></description>
		<pubDate>Tue, 25 May 2010 13:41:25 +0000</pubDate>
		<guid isPermaLink="false">a896144046a1b5bd6e3e034d00b4f73a</guid>
	</item>
	<item>
		<title>Deferred Rendering Demystified</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/deferred-rendering-demystified-r2746</link>
		<description><![CDATA[This article is a design article about implementing deferred rendering. The motive behind it is that while there have been many articles and presentations about the concepts behind deferred rendering (for example, the <a href='http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf' class='bbc_url' title='External link' rel='nofollow external'>article about deferred rendering in Killzone 2</a>), there is very little information about how to approach it from a design standpoint. This article aims to do just that. <br /><br />The article is accompanied by code that implements a deferred rendering framework, and is somewhat a development journal of that framework. You can get the code from the SVN repository of the open source 3D rendering engine, <a href='http://www.ogre3d.org/' class='bbc_url' title='External link' rel='nofollow external'>Ogre</a>. The code is in the sample called DeferredShading.<br /><br /><h1>Deferred Rendering In A Nutshell</h1><br />Deferred rendering is an alternative approach to rendering 3d scenes. The classic rendering approach involves rendering each object and applying lighting passes to it. So, if an object is affected by 6 lights, it will be rendered 6 times, once for each light, in order to accumulate the effect of each light. This approach is often called "forward rendering".<br /><br />Deferred rendering takes another approach: first of all of the objects render their "lighting related infomation" to a texture, called the G-Buffer. This includes their colours, normals, depths and any other info that might be relevant to calculating their final colour. Afterwards, the lights in the scene are rendered as geometry (sphere for point light, cone for spotlight and full screen quad for directional light), and they use the G-buffer to calculate the colour contribution of that light to that pixel.<br /><br />The motive for using deferred rendering is mainly performance related – instead of having a worst case batch count of num_objects * num_lights (if all objects are affected by all lights), you have a fixed cost of num_objects + num_lights. There are other pros and cons of the system, but the purpose of this article is not to help decide whether deferred rendering should be used, but how to do it if selected.<br /><br /><h1>The Problem</h1><br />The main difficulty with implementing deferred rendering is that you have to do everything on your own. The regular rendering approach involves rendering each object directly to the output buffer (called 'forward rendering'). This means that all of the transform & lighting calculations for a single object happen in a single stage of the process. The graphics API that you are working with (DirectX, OpenGL etc) exposes many options for rendering objects with lights. This is often called the 'fixed function pipeline', where you have API calls that control the fashion in which an object is rendered. Since we are splitting up the rendering to two parts, we can not use these faculties at all, and have to re-implement the basic (and advanced) lighting models ourselves in shaders. Even shaders written for the forward pipeline won't be usable, since we use an intermediate layer (the G-Buffer). They will also need to be modified to write to the G-Buffer / read from the G-buffer (usually the first). In addition to that, the architecture of the rendering pipeline changes – objects are rendered regardless of lights, and then geometric representations of the light's affected area have to be rendered, lighting the scene. This is very different from the classic way – since when do we render lights?<br /> <br /><h1>The Goal</h1><br />We would like to create a deferred rendering pipeline that is as unobtrusive as possible – we do not want the users of the engine to have to use it differently because of the way that its rendered, and we really don't want the artists to change the way they work just because we use a deferred renderer. So, we want an engine that can:<br /><ul class='bbc'><li>Interact with the game engine in the same way that the forward renderer does.</li><li>Use the same art and assets as the forward renderer and generate the same results</li><li>Be extended by users of the framework to still be as flexible as forward rendering.</li></ul><br /><h1>Starting Point</h1><br />This is not an article about starting a graphics engine from scratch. It assumes that you already have a somewhat high-level engine set up. The code that will be presented here is based on the open source Ogre 3D engine, but can be treated as semi-pseudo-code that can be re-implemented for other engines with similar concepts – materials, render order grouping and render-to-texture (including multiple render targets, referred to as MRT) pipeline control. Here is a short rundown of what would be required from a 3d rendering engine to build a deferred renderer on:<br /><br /><strong class='bbc'>Material System:</strong> The system that stores all of the information that is required to render a single object type besides the geometry. Links to textures, alpha settings, shaders etc. are stored in an object's material. The common material hierarchy includes two levels:<br /><ul class='bbc'><li>Technique – When an object will be rendered, it will use exactly one of the techniques specified in the material. Multiple techniques exist to handle different hardware specs (If the hardware has shader support use technique A, if not fall back to technique B), different levels of detail (If object is close to camera use technique 'High', otherwise use 'Low'). In our case, we will create a new technique for objects that will get rendered into the G-buffer.</li><li>Pass – An actual render call. A technique is usually not more than a collection of passes. When an object is rendered with a technique, all of its passes are rendered. This is the scope at which the rendering related information is actually stored. Common objects have one pass, but more sophisticated objects (for example detail layers on the terrain or graffiti on top of the object) can have more.</li></ul>Examples of material systems outside of Ogre are Nvidia's CGFX and Microsoft's HLSL FX.<br /><br /><strong class='bbc'>Render Queues / Ordering System:</strong> When a scene full of objects is about to be rendered, who gets rendered when? That is the responsibility of this system. All engines need some control over render order since semi-transparent objects have to be rendered after the opaque ones in order to get the right output. Most engines will give you some control over this, as choosing the correct order can have visual and performance implications (less overdraw = less pixel shader stress = better performance, for example).<br /><br /><strong class='bbc'>Full Scene / Post Processing Framework:</strong> This is probably the most sophisticated and least common of the three, but is still common. Some rendering effects, such as blur and ambient occlusion, require the entire scene to be rendered differently. We need the framework to support directives such as "Render a part of the scene to a texture", "Render a full screen quad", allowing us to control the rendering process from a high perspective.<br /><br />In the Unreal Development Kit, this is called the "Unreal Post Process Manager". When working directly with OpenGL/ DirectX, you will have to write a layer like this on your own, or hardcode the pipeline (less recommended if creating an engine that is supposed to suit more than one game). Having a strong framework will also open up other possibilities like <a href='http://graphics.cs.uiuc.edu/%7Ekircher/inferred/inferred_lighting_paper.pdf' class='bbc_url' title='External link' rel='nofollow external'>inferred lighting</a>.<br /><br /><h1>Generating the G-Buffer</h1><br />So we know what we want to do, we can now start creating a deferred rendering framework on top of the engine. The problem of deferred rendering can be split up into two problems – creating the G-Buffer and lighting the scene using the G-Buffer. We will tackle both of them individually. <br /><br /><h2>Deciding on a Texture Format</h2><br />The first stage of the deferred rendering process is filling up a texture with intermediate data that allows us to light the scene later. So, the first question is, what data do we want? This is an important question – it is the anchor that ties both stages together, so they both have to synchronized with it. The choice has performance (memory requirements), visual quality (accuracy) and flexibility (what doesn't get into the G-Buffer is lost forever) implications.<br /><br />We chose two FLOAT16_RGBA textures, essentially giving us eight 16 bit floating point data members. It's possible to use integer formats as well. The first one will contain the colour in RGB, specular intensity in A. The second one will contain the view-space-normal in RGB (we keep all 3 coordinates) and the (linear) depth in A.<br /><br />Choosing a texture format is not a trivial decision, and has quite a big impact. The factors that come into play when choosing a texture format are:<br /><br /><ol class='bbc'><li>GPU Memory consumption – This buffer will need to be created at the size of the viewport, for each viewport. For example, if the viewport is 1024x1024, you are paying one megabyte per byte of information per pixel. So, in our case, we have eight two-byte channels. That’s 16 bytes per pixel, so the G-buffer will require 16 megabytes of texture memory!</li><li>Flexibility – The G-Buffer is the ONLY link between the objects and the lighting process. Data that does not get into the G-buffer is lost forever. For example, we save enough information to light the scene using the standard model, but we do not know what object each pixel came from. Object-specific lighting options (for example, highlight the selected character) need an 'object ID' saved to the G-buffer as well.</li><li>Accuracy – More bits per data = more accuracy. Are 16 bits enough accuracy for depth? Maybe 8 bits per color channel would have been enough? This is usually a direct tradeoff with the memory factor.</li><li>Speed – Also a tradeoff with memory consumption. Some math tricks can be used to save memory. For example, the normal's Z coordinate can be recalculated from the X and Y coordinates if its sign is stored in a single bit (since X^2 + Y^2 + Z^2 = 1 =&gt; |Z| = Sqrt(1 – Y^2 – X^2) ), but those kind of calculations takes time.</li></ol><br />In this case we chose the FLOAT16_RGBA pixel format mostly for simplicity. The information for the basic lighting model is there, and its easy to access. This can be changed later of course. <br /><br /><h2>Preparing the objects for G-Buffer rendering</h2><br />We assume that the material system has a scheming/profiling feature. This means that materials can specify multiple techniques that will be used in different scenarios (see 'Starting point' for futher explanation). We will use this system and define a new scheme – the G-Buffer scheme that will output the intermediate contents to the texture. Since we didn't want the artists to change the materials that they generate, the techniques for this scheme should be generated programatically (this can happen offline or at runtime) We will create this technique in all of the materials in our app. There are two ways to do this – either offline (load all the materials, add the GBuffer technique, save them back to disk) or online (when a material is loaded or about to be used, add the technique to its list). The lazy online approach was used in the Ogre sample, but all methods are possible.<br /><br />How do we create the G-Buffer technique?<br /> <br /><strong class='bbc'>A) Inspect the classic technique</strong><br /><br />For each material (described in 'Starting Point'), we look at the technique that would have been used normally. For each pass in this technique, we check its properties and see what it does - does this pass have a texture? A normal map? Is it skinned? Transparent? Etc. The resulting classification should contain all the information required to build a GBuffer technique for an object. In some cases (like semi-transparent objects), it should also be able to flag that a certain object can not be deferred rendered, and it will be rendered normally later. In the Ogre sample, the function that does this is GBufferSchemeHandler::inspectPass, which returns a structure called PassProperties, which contains the fields needed to generate a g-buffer writing technique.<br /><br /><strong class='bbc'>B) Generate the G-Buffer technique</strong><br /><br />After a pass has been inspected and understood, the next stage is to generate the G-Buffer-writing technique. Since the classification has all the information needed, it is possible to derive the material properties and create the matching shader to write to the MRT. A question that arises here is how should the shader be generated? There are two common approaches for this – the ubershader approach and the shader generation approach.<br /><br />The first one involves writing a big shader beforehand with lots of preprocessor directives and compiling the right version of the shader during runtime, and the latter means generating shader code on the fly. For this task, I chose code generation as the option for two reasons:<br /><ul class='bbc'><li>There are quite a few options that affect G-Buffer rendering – diffuse textures, normal / specular / parallax maps, vertex colours, skinning etc. Writing a single ubershader for this will be very hard, because of the many preprocessor definitions. For example : how do you assign texture coordinate indices?</li><li>Debugging is comfortable because you see a simple shader that does exactly what a certain object needs. Here is an example of a shader pair generated (in this case, a normal map, a texture and a single predefined specularity level)</li></ul>For example:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
void ToGBufferVP(
	float4 iPosition : POSITION,
	float3 iNormal&nbsp;&nbsp; : NORMAL,
	float2 iUV0 : TEXCOORD0,
	float3 iTangent : TANGENT0,

	out float4 oPosition : POSITION,
	out float3 oViewPos : TEXCOORD0,
	out float3 oNormal : TEXCOORD1,
	out float3 oTangent : TEXCOORD2,
	out float3 oBiNormal : TEXCOORD3,
	out float2 oUV0 : TEXCOORD4,

	uniform float4x4 cWorldViewProj,
	uniform float4x4 cWorldView
	)
{
	oPosition = mul(cWorldViewProj, iPosition);
	oNormal = mul(cWorldView, float4(iNormal,0)).xyz;
	oTangent = mul(cWorldView, float4(iTangent,0)).xyz;
	oBiNormal = cross(oNormal, oTangent);
	oViewPos = mul(cWorldView, iPosition).xyz;
	oUV0 = iUV0;
}

void ToGBufferFP(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3 iViewPos : TEXCOORD0,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3 iNormal&nbsp;&nbsp; : TEXCOORD1,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3 iTangent : TEXCOORD2,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3 iBiNormal : TEXCOORD3,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float2 iUV0 : TEXCOORD4,

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out float4 oColor0 : COLOR0,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out float4 oColor1 : COLOR1,

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uniform sampler sNormalMap : register(s0),
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uniform sampler sTex0 : register(s1),
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uniform float4 cDiffuseColour,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uniform float cFarDistance,
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uniform float cSpecularity
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)
{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;oColor0.rgb = tex2D(sTex0, iUV0);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;oColor0.rgb *= cDiffuseColour.rgb;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;oColor0.a = cSpecularity;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3 texNormal = (tex2D(sNormalMap, iUV0)-0.5)*2;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float3x3 normalRotation = float3x3(iTangent, iBiNormal, iNormal);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;oColor1.rgb = normalize(mul(texNormal, normalRotation));
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;oColor1.a = length(iViewPos) / cFarDistance;
}
</pre><br />Remember to keep the shaders (uber or generated) synchronized with the G-Buffer format you decided on. Here is a screenshot from NVIDIA's PerfHUD tool showing the G-Buffer being built:<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[eb271baf999a453589b70369f3f747b9]' id='ipb-attach-url-15715-0-32539900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15715" title="fig01.png - Size: 77.75K, Downloads: 12"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-68217000-1368496526_thumb.png" id='ipb-attach-img-15715-0-32539900-1369539296' style='width:480;height:378' class='attach' width="480" height="378" alt="Attached Image: fig01.png" /></a><br /></p><br />You can see the G-buffer textures (two of them) on the right side of the picture. Notice that nothing has been written to the back-buffer yet.<br /><br />In the Ogre sample, the GBufferMaterialGenerator class takes care of this.<br /><br /><strong class='bbc'>C) Postpone transparent / other none-deferred objects</strong><br /><br />The pass inspection tells us if a pass can be deferred or not. If not, we want to make sure we can render the none-deferred objects later. Again, material techniques come into play, and a NoGBuffer technique is introduced. When a none-deferred pass is detected, it is copied as-is to the NoGBuffer technique, allowing us to tell the framework to render all none-deferred objects once the deferred composition part is over. We will get to that later.<br /> <br /><h2>Allow overriding the automatic process</h2><br />The shader generation makes it easier to manage assets in a deferred rendering environment since it generates the shaders from the fixed function parameters, but this is not always possible. You might have objects with specialized shaders that don't fit in any generic scheme, but still want to defer them as well. It is worthwhile to keep this option open, for better flexibility. The proposed framework does this already, as the programmatic material/shader generation only happens when an object without a technique for the GBuffer scheme defined. This means that if an object has a technique predefined, it will override the automatic process. This makes upkeep harder (you need to synchronize all manual shaders with the MRT format if you change it, for example) but is unavoidable in certain scenarios.<br /> <br /><h2>G-Buffer Generation Summary</h2><br />Using these four stages, we added a hook in the system that when the GBuffer scheme is enabled, existing objects will be inspected, classified and have matching writing materials and shaders generated for them, allowing the existing pipeline to render itself to the G Buffer without asset modification. Some objects will be postponed and forward rendered later, some objects will be rendered using custom shaders – flexibility and compatibility remains, and the G-Buffer is created! <br /><br /><h1>Lighting the Scene</h1><br />We now have a prepared G-buffer with all the intermediate data we need to light the scene. Our next job is to render each light to the scene, calculating its contribution to the final image. This stage differs from the standard approach in a big way – we render lights! There is a question of who triggers the actual rendering. For that, we use the full scene / post processing framework (See description in 'Starting point'). In Ogre, it is called the compositor framework. <br /><br /><h2>Compositing Scenes (a general idea)</h2><br />The classic forward rendering pseudo code is:<br /><br /><tt>for each visible object:<br />	for each light that affects object:<br />		render object with light contribution to main window</tt><br /><br />However, this is not always the case, even before deferred rendering. Many post-processing effects such as blurring the scene require rendering the scene to a texture, and then rendering the texture to the final output image using a pixel shader with a full screen quad. A simple motion blur pseudo code might be:<br /><br /><tt>for each visible object:<br />	for each light that affects object:<br />		render object with light contribution to a texture<br /><br />blend texture with 'previous frame' texture to screen<br />		<br />copy texture to 'previous frame' texture</tt><br /><br />A good scene composition framework will allow these kind of pipelines to be defined. Ogre does this with the compositor framework, which allows these kind of pipelines to be defined in scripts. For example, this is the GBuffer generating compositor:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
compositor DeferredShading/GBuffer
{
	technique
	{
		texture GBufferTex target_width target_height PF_FLOAT16_RGBA PF_FLOAT16_RGBA chain_scope
		
		target GBufferTex
		{
			input none
			pass clear
			{
			}
			
			shadows off
			material_scheme GBuffer
			
			pass render_scene
			{
				//These values are synchronized with the code
				first_render_queue 10
				last_render_queue&nbsp;&nbsp;79	
			}
		}
	}
}

</pre><br />Even if you never heard of Ogre, this script should be self explanatory. The multi render texture is defined as two FLOAT16_RGBA textures, it is cleared and the scene is rendered to with the GBuffer material scheme every frame. <br /><br /><strong class='bbc'>Back to our case…</strong><br /><br />This is exactly the time of custom control we need. We will now define a custom composition pass that will render light as geometry.<br /><br /><pre class='prettyprint lang-auto linenums:0'>
compositor DeferredShading/CompositeScene
{
	technique
	{
		//Reference the main Gbuffer texture
		texture_ref GBuffer DeferredShading/GBuffer GBufferTex
		
		target_output
		{
			input none
			//We will dispatch the shadow texture rendering ourselves
			shadows off
			
			// render skies and other pre-gbuffer objects
			pass render_scene
			{
				first_render_queue 1
				last_render_queue&nbsp;&nbsp;9			
			}
			
			//Render the lights and their meshes
			pass render_custom DeferredLight
			{
				input 0 GBuffer 0
				input 1 GBuffer 1
			}
			
			pass render_scene
			{
				material_scheme NoGBuffer
				first_render_queue 10
				last_render_queue 79
			}
		}
	}
}
</pre><br />Once a proper full scene composition framework is in place, this task becomes pretty simple to design. This compositor uses the result of the GBuffer compositor (see the texture_ref declaration), renders the skies (early render groups), then the light's geometry, then the objects that we skipped earlier (this is where the NoGBuffer scheme comes into place). The only thing left to do is to implement the deferred light composition pass. The main challenge is the fact that in deferred rendering, all the lighting calculations are your responsibility – including basic diffuse/specular lighting.<br /><br />Once again, the question of shader management comes into play. This time around I actually chose to use the uber-shader approach, because the inputs don't change frequently, and there are less combinations to manage. Light type, and a shadow casting flag are enough for almost everything, so it doesn't create a mess. There are many papers on how the calculations are made, and the demo code is also a reference, although a basic one. The basic rule of thumb is that the G-buffer gives you view-space position and normal, which should be enough.<br /><br />Creating the geometry representing the light is not a difficult task, creating a quad / sphere / cone programmatically is basic 3d geometry stuff. The attached demo contains code that does that. (See the GeomUtils class).<br /><br />There are other minor issues – depending on your framework, you might need to reconstruct the depth buffer for the future objects to be able to depth test against the scene. Since the G-Buffer contains the depth as well, it is possible to rebuild the depth buffer from it, by rendering a quad with a pixel shader that reconstructs view space position, multiplies by the projection matrix and outputs the depth (remember – pixel shaders can output depth). In the framework, this is done by the ambient light, which is a 'fake light' that fills the scene with the ambient color and rebuilds depth, and is also rendered during the light composition pass. (Solutions to this problem are also <a href='http://mynameismjp.wordpress.com/2009/03/10/reconstructing-position-from-depth' class='bbc_url' title='External link' rel='nofollow external'>already explained</a> to good detail on the internet).<br /><br />Here are some screenshots illustrating lighting of the scene using the G-Buffer :<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[eb271baf999a453589b70369f3f747b9]' id='ipb-attach-url-15716-0-32555000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15716" title="fig02.png - Size: 78.01K, Downloads: 13"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-05006100-1368496527_thumb.png" id='ipb-attach-img-15716-0-32555000-1369539296' style='width:480;height:360' class='attach' width="480" height="360" alt="Attached Image: fig02.png" /></a><br /></p><br />As you can see, this time around the G-Buffer textures are used as input to the light geometry rendering, with the 3rd texture being the shadow map built for the light (it is built during the scene lighting stage to re-use the same shadow texture for all lights).<br /><br />Here is a visualization of the scene after one light has rendered itself to the main buffer :<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[eb271baf999a453589b70369f3f747b9]' id='ipb-attach-url-15717-0-32569300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15717" title="fig03.png - Size: 199.86K, Downloads: 10"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-02908500-1368496528_thumb.png" id='ipb-attach-img-15717-0-32569300-1369539296' style='width:480;height:362' class='attach' width="480" height="362" alt="Attached Image: fig03.png" /></a><br /></p><br />This light is a spotlight that casts shadows. Once all the lights render themselves in this fashion, the scene is fully lit!<br /><br />Undeferred objects will be rendered normally afterwards – a render scene directive will be issued in the full scene rendering framework after the lighting takes place. This is a drawback since you will have to implement forward rendering techniques for them (which might lead to code duplication), but objects that can't be deferred are usually different from normal objects, so they would require special treatment anyways.<br /><br /><h1>Additional post processing effects</h1><br />Another advantage of deferred rendering, is that some advanced post-filters (like SSAO) require full-scene renders to get intermediate information about the scene and use it. If the G-Buffer contains this information, you can apply these effects without another full scene render; just make sure that your framework makes it easy to pass information (such as textures) from different render sequences to each other, and you are practically done. In the case of OGRE, the texture_ref directive (also used in the final scene composition) is all that is needed. The deferred rendering demo that accompanies this paper contains an SSAO postfilter, so you can switch it on and off and see the visual/performance impact.<br /> <br /><h1>Summary</h1><br />My goal in this article was to get into some of the less intuitive details of deferred rendering implementations. I hope that this article will help anyone trying to implement a deferred rendering framework get to their target. Remember that there is full source code of a sample implementation as part of the Ogre SDK, so you can see it in action for full reference. For people with direct ogre experience, there is also an <a href='http://www.ogre3d.org/wiki/index.php/Deferred_Shading' class='bbc_url' title='External link' rel='nofollow external'>article that explains the Ogre usage in the demo</a>. Good luck!]]></description>
		<pubDate>Mon, 01 Mar 2010 16:46:32 +0000</pubDate>
		<guid isPermaLink="false">c1fcffd51eb7c38b7209a6106d66cc84</guid>
	</item>
	<item>
		<title>Cube Map Rendering Techniques for Direct3D10</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/cube-map-rendering-techniques-for-direct3d10-r2735</link>
		<description><![CDATA[I have recently been working on porting the Kourjet engine from Direct3D9 to Direct3D10. One of the things I wanted to improve was the rendering of point light shadow maps. In Direct3D9, you do not have much choice in rendering a cube map. The limitations of the API only allow rendering the six faces of the cube map in six distinct passes. However, new features in Direct3D10, such as geometry shaders and improvements to geometry instancing allow you to implement alternative algorithms for rendering cube maps in a single pass. This article will present three such algorithms and discuss their drawbacks and advantages.<br /><br /> Before we get to each algorithm, we will introduce some notations and conventions that will be used in the article. Our goal in the following algorithms is to render as efficiently as possible an arbitrary cube map located in an arbitrary location in the scene. The cube we are rendering to is aligned on the axes of the world basis (X, Y, Z).<br /><br /> It is assumed in this article that the reader is already familiar with the concept of <a href='http://en.wikipedia.org/wiki/Cube_mapping' class='bbc_url' title='External link' rel='nofollow external'>cube mapping</a> as well as the fundamentals of working with vertex and geometry shaders.<br /><br /> <br /><strong class='bbc'>Algorithm 1 - Multi-Pass Rendering</strong><br /> Multi-Pass Rendering is the simplest algorithm and also the only one that can be used on Direct3D9-class hardware. The idea is to render the scene to each one of the cube faces separately. The (naive) algorithm could look like this:<br /><br /> BaseObjectSet = Select Objects to be Rendered on the Cube Map (1) For each face F of the cube map 	Set the Render Target corresponding to F 	Set the view and projection matrices corresponding to F 	Foreach Object O in BaseObjectSet&nbsp;&nbsp;&nbsp;&nbsp; 	Render( O ) 	End End <em class='bbc'>(1) is a step that allows you to select a subset of the scene according to the context of the cube map. For example, if you are rendering a cube map for shadow mapping, you will here select only the objects in the light range and casting shadows. If it is a reflection cube-map, you could include only objects close enough to the cube position.</em><br /><br /> If BaseObjectSet contains N objects, we will then render exactly 6N objects and make six render target switches.<br /><br /> To improve on this first version of the algorithm, we can notice that we could apply one more culling step: as we are rendering one face at a time, we can test the current object to see if it is in the current view frustum (constructed from the view/projection matrices for the current face). The improved algorithm would then look like this:<br /><br /> BaseObjectSet = Select Objects to be Rendered on the Cube Map (1) For each face F of the cube map 	Set the Render Target corresponding to F 	Set the view and projection matrices corresponding to F 	Compute the view frustum VF corresponding to F 	Foreach Object O in BaseObjectSet&nbsp;&nbsp;&nbsp;&nbsp; 	If O is visible in VF Then&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	Render( O ) 	End End We can thus skip objects that are not visible on all the faces. An object of standard size such as a character or a car, will usually only be visible on two to three faces at one time. Thus, at worst we still render 6N objects but in practice we will render on average 2N to 4N objects (this of course depends on the exact set of objects in the scene). We however still have to switch the render target six times.<br /><br /> <br /><strong class='bbc'>Algorithm 2 - Single-Pass Rendering Using Only a Geometry Shader</strong><br /> Direct3D10 comes with a sample program that renders a cube map in a single pass using a geometry shader. The idea is to pass six view/projection matrices to the geometry shader, and for each object primitive, generate six other primitives that are projected onto their respective cube face. We can thus take our previous algorithm and factorize some operations:<br /><br /> BaseObjectSet = Select Objects to be Rendered on the Cube Map (1) Set the six Render Targets corresponding to all the faces as an array Set the six view and projection matrices corresponding to all the faces as an array Foreach Object O in BaseObjectSet 	Render( O ) End As you can see, we got rid of the cube face loop. Before iterating the object set, we compute and set the needed variables for all faces at once (matrices and render targets). Similarly, there will be only one draw call per object. The number of API calls has been divided by six here compared to the worst-case scenario! Here is what the vertex shader and geometry shader could look like:<br /><br /> VS_OUTPUT VertexShader( VS_INPUT input ) { 	VS_OUTPUT output = (VS_OUTPUT) 0; 	output.position = mul( float4( input.position, 1), matWorld );&nbsp;&nbsp;// Position in world space 	return output; } [maxvertexcount(18)] void GeometryShader( triangle VS_OUTPUT input[3], inout TriangleStream CubeMapStream ) { 	// For each face of the cube, create a new triangle 	[unroll] 	for( int f = 0; f &lt; 6; ++f ) 	{&nbsp;&nbsp;&nbsp;&nbsp; 	GS_OUTPUT output = (GS_OUTPUT) 0;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Assign triangle to the render target corresponding to this cube face&nbsp;&nbsp;&nbsp;&nbsp; 	output.RTIndex = f;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// For each vertex of the triangle, compute screen space position and pass distance&nbsp;&nbsp;&nbsp;&nbsp; 	[unroll]&nbsp;&nbsp;&nbsp;&nbsp; 	for( int v = 0; v &lt; 3; ++v )&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	output.position = mul( input[v].position, matViewProj[f] );&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	CubeMapStream.Append( output );&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// New triangle&nbsp;&nbsp;&nbsp;&nbsp; 	CubeMapStream.RestartStrip(); 	} } The vertex shader simply transforms the vertices from object space to world space. Most of the work is done in the geometry shader. In fact our face loop from the first algorithm can be found here: we have traded CPU cycles for GPU cycles. One problem that seems obvious in this naive implementation is that there is no culling on a face basis. Which means that the geometry shader will always be outputting six times more vertices.<br /><br /> One way to go around this problem is to test the object against each of the faces' respective frusta and set a flag to be used in the geometry shader to skip generation of some primitives. We can then change the algorithm as follow:<br /><br /> BaseObjectSet = Select Objects to be Rendered on the Cube Map (1) Set the six Render Targets corresponding to all the faces as an array Set the six view and projection matrices corresponding to all the faces as an array Compute the six view frusta corresponding to all the faces Foreach Object O in BaseObjectSet 	For each face F of the cube map&nbsp;&nbsp;&nbsp;&nbsp; 	Test O with the F's frustum&nbsp;&nbsp;&nbsp;&nbsp; 	If O is visible, set the geometry shader flag for F (cubeFaceWriteFlag |= 1 &lt;&lt; F) 	End 	Render( O ) End [maxvertexcount(18)] void GeometryShader( triangle VS_OUTPUT input[3], inout TriangleStream CubeMapStream ) { 	// For each face of the cube, create a new triangle 	[unroll] 	for( int f = 0; f &lt; 6; ++f ) 	{&nbsp;&nbsp;&nbsp;&nbsp; 	// Only output primitives to the required cube faces&nbsp;&nbsp;&nbsp;&nbsp; 	[branch]&nbsp;&nbsp;&nbsp;&nbsp; 	if ( ( cubeFaceWriteFlag & ( 1 &lt;&lt; f ) ) != 0 )&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	GS_OUTPUT output = (GS_OUTPUT) 0;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Assign triangle to the RT corresponding to this cube face&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	output.RTIndex = f;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// For each vertex of the triangle, compute screen space position and pass distance&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	[unroll]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	for( int v = 0; v &lt; 3; ++v )&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	output.position = mul( input[v].position, matViewProj[f] );&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	CubeMapStream.Append( output );&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// New triangle&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	CubeMapStream.RestartStrip();&nbsp;&nbsp;&nbsp;&nbsp; 	} 	} } Although we have managed to skip generation of some primitives, the geometry shader is still not optimal. Indeed, the geometry shaders are usually optimized for generation of at most four vertices whereas we are generating at most 18.<br /><br /> <br /><strong class='bbc'>Algorithm 3 - Single-Pass Rendering Using a Geometry Shader and Geometry Instancing</strong><br /> Geometry instancing is a concept that has already been around in Direct3D9. However, Direct3D10 and its geometry shaders open some new ways of using it. In Direct3D9, one would pass a vertex buffer and a buffer containing instance data (for example six transformation matrices) and the API will, in a single draw call, render the vertex buffer six times, each with different instance data. Very nice, but useless for us when we need to render to a cube map: there is no way to select dynamically the render target that will be used for a given instance.<br /><br /> This is where the geometry shaders come into picture. It is indeed possible as seen in the previous algorithm to indicate per primitive to which render target it should be rendered. The idea will then be to have an instance buffer dynamically filled with the face indices on which the object is visible. So here we are with a first version of our third algorithm:<br /><br /> Select Objects to be Rendered on the Cube Map (1) Set the six Render Targets corresponding to all the faces as an array Set the six view and projection matrices corresponding to all the faces as an array Compute the six view frusta corresponding to all the faces Foreach Object O in BaseObjectSet 	Reset Dynamic Instance Buffer 	For each face F of the cube map&nbsp;&nbsp;&nbsp;&nbsp; 	Test O with the F's frustum&nbsp;&nbsp;&nbsp;&nbsp; 	If O is visible, append F to the instance buffer 	End 	Render six instances of O using our dynamic instance buffer End The algorithm is pretty similar to the previous one: we manage to minimise the number of API calls by rendering to all the cube faces at once. By testing against the face frusta, we only append to the instance buffer the relevant faces, thus not instancing any invisible geometry (as in the previous algorithm). Even though this looks pretty smooth in our pseudo algorithm, there are some hidden complications.<br /><br /> In Direct3D, as we are free to organise the vertex/instance data as we wish, we must tell the API (using vertex layouts) how data we are providing is organised. Usually the engine will store this per mesh and switch the vertex layout when drawing the object. Indeed, we want to keep the freedom of being able to specify any vertex layout (maybe one for characters, one for terrain, one for sky, etc.). In this algorithm, we are appending instance data and thus have to append some information to the existing vertex layout. In the Kourjet Engine for example, the new vertex layouts are computed on the fly the first time we draw an object and kept in a cache belonging to the render path.<br /><br /> The vertex shader and geometry shader are pretty simple once we got the vertex layout right. We basically use the vertex shader to do all transformations and the geometry shader simply to direct the primitive to the appropriate render target. Here is the source code:<br /><br /> struct VS_INPUT { 	float3 position : POSITION; 	uint&nbsp;&nbsp; cubeFace : CUBEFACE; }; struct VS_OUTPUT { 	float4 position : SV_POSITION; // Position in screen space 	uint&nbsp;&nbsp; cubeFace : CUBEFACE; }; struct GS_OUTPUT { 	float4 position : SV_POSITION; // Position in screen space 	uint&nbsp;&nbsp; RTIndex&nbsp;&nbsp;: SV_RenderTargetArrayIndex; }; VS_OUTPUT VertexShader( VS_INPUT input ) { 	VS_OUTPUT output = (VS_OUTPUT) 0; 	float4 P = mul( float4( input.position, 1), matWorld );	// Position in world space 	output.position = mul( P, matViewProj[ input.cubeFace ] ); 	output.cubeFace = input.cubeFace; 	return output; } [maxvertexcount(3)] void GeometryShader( triangle VS_OUTPUT input[3], inout TriangleStream CubeMapStream ) { 	GS_OUTPUT output = (GS_OUTPUT) 0; 	// Assign triangle to the render target corresponding to this cube face 	output.RTIndex = input[0].cubeFace; 	// For each vertex of the triangle 	[unroll] 	for( int v = 0; v &lt; 3; ++v ) 	{&nbsp;&nbsp;&nbsp;&nbsp; 	output.position = input[v].position;&nbsp;&nbsp;&nbsp;&nbsp; 	CubeMapStream.Append( output ); 	} 	// New triangle 	CubeMapStream.RestartStrip(); } <br /><strong class='bbc'>Conclusion</strong><br /> <br /><strong class='bbc'>The algorithm pros and cons</strong><br />&nbsp;&nbsp;&nbsp;&nbsp;Pros Cons&nbsp;&nbsp; Algorithm 1&nbsp;&nbsp;&nbsp;&nbsp;<ul class='bbc'><li>Works on Direct3D9 hardware 	</li><li>Can do visibility culling on each face's frustum&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp; <ul class='bbc'><li>Need to switch render targets six times 	</li><li>At worst six draw calls per object&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp;Algorithm 2&nbsp;&nbsp;&nbsp;&nbsp;<ul class='bbc'><li>One render target switch 	</li><li>One draw call per object&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp; <ul class='bbc'><li>Geometry shader could be less than optimal depending on the hardware and the vertex format <sup class='bbc'>[1]</sup> 	</li><li>No real culling per face (vertex shader is still run six times) 	</li><li>Direct3D10 only&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp;Algorithm 3&nbsp;&nbsp;&nbsp;&nbsp;<ul class='bbc'><li>One render target switch 	</li><li>One draw call per object 	</li><li>Can do visibility culling on each face's frustum (instancing only geometry if visible on a face)&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp; <ul class='bbc'><li>Must maintain a cache of duplicated vertex layouts for instancing 	</li><li>Direct3D10 only&nbsp;&nbsp;</li></ul>&nbsp;&nbsp;&nbsp;&nbsp;<br /><strong class='bbc'>Source code and implementation</strong><br /> This article does not come with any particular source code attached. However implementation of these algorithms applied to rendering of shadow maps can be found in the <a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/?pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>Kourjet Engine source code</a>, and in particular the following files can be of interest:<br /><br /> <ul class='bbc'><li><a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Modules/gfx/Src/kgfxDefaultPointLightShadowMapRenderPath.cpp?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>Algorithm 1</a> and its corresponding <a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Samples/Data/scene-test-shadowmap-multipass.fx?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>effect file</a> </li><li><a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Modules/gfx/Src/kgfxSinglePassPointLightShadowMapRenderPath.cpp?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>Algorithm 2</a> and its corresponding <a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Samples/Data/scene-test-shadowmap-singlepass.fx?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>effect file</a> </li><li><a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Modules/gfx/Src/kgfxInstancingPointLightShadowMapRenderPath.cpp?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>Algorithm 3</a> and its corresponding <a href='http://kourjet.svn.sourceforge.net/viewvc/kourjet/trunk/Samples/Data/scene-test-shadowmap-instancing.fx?view=markup&pathrev=57' class='bbc_url' title='External link' rel='nofollow external'>effect file</a></li></ul><strong class='bbc'>Reference articles & other links</strong><br /> <ul class='bbc'><li>The blog posts about the implementation of these algorithms in Kourjet for shadow mapping (<a href='http://kourjet.sourceforge.net/2009/11/27/shadow-mapping/' class='bbc_url' title='External link' rel='nofollow external'>part 1</a>, <a href='http://kourjet.sourceforge.net/2009/11/29/point-light-shadow-mapping-part-2/' class='bbc_url' title='External link' rel='nofollow external'>part 2</a> and <a href='http://kourjet.sourceforge.net/2009/11/30/point-light-shadow-mapping-part-3-end/' class='bbc_url' title='External link' rel='nofollow external'>part 3</a>) </li><li>In this <a href='http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf' class='bbc_url' title='External link' rel='nofollow external'>paper on GPU programming</a>, NVidia gives some recommandations for Geometry Shaders, and in particular in section 4.6, page 40, they say:</li></ul> <p class='bbc_indent' style='margin-left: 40px;'><p class='bbc_indent' style='margin-left: 40px;'><em class='bbc'> The performance of a GS is inversely proportional to the output size (in scalars) declared in the Geometry Shader, which is the product of the vertex size and the number of vertices (maxvertexcount). This performance degradation however occurs at particular output sizes, and is not smooth. For example, on a GeForce 8800 GTX a GS that outputs at most 1 to 20 scalars in total would run at peak performance, whereas a GS would run at 50% of peak performance if the total maximum output was declared to be between 27 and 40 scalars.<br /><br /> More concretely, if your vertex declaration is:<br /> <br /><br />float3 pos: POSITION; float2 tex: TEXCOORD;<br /> Each vertex is 5 scalar attributes in size.<br /><br /> If the GS defines a maxvertexcount of 4 vertices, then it will run at full speed (20 scalar attributes).<br /> But if you increase the vertex size by adding a float3 normal, the number of scalars will increase by 4 * 3, putting your total at 32 scalar attributes. This will result in the GS running at 50% peak performance.<br /><br /> </em></p></p>]]></description>
		<pubDate>Mon, 01 Feb 2010 09:32:33 +0000</pubDate>
		<guid isPermaLink="false">68310dc294a1c38c7ba636380151daca</guid>
	</item>
	<item>
		<title>Walls and Shadows in 2D</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/walls-and-shadows-in-2d-r2711</link>
		<description><![CDATA[This article will explore a technique for efficiently computing visible and lit regions in a 2D scene.<br /><br /> The algorithms presented are intended for use with large maps, and where computation time of some appreciable fraction of a second is tolerable. There are faster algorithms, but they involve solving the intersection and union of arbitrary polygons; the reasons for rejecting this approach are discussed below.<br /><br /> Consider a region, such as a cave, dungeon or city, viewed from above, such that floors are shown by areas and walls by lines. Floors are always enclosed by some set of walls. Put another way, all maps are bounded by a continuous wall. We will ignore other map features; for our purposes, the only “important” features are walls and floors.<br /><br /> Lights exist at points, and have a given area of effect, defined by a radius. Everything inside this radius that faces the light is considered lit. Lights can have overlapping areas. Walls cast shadows, making the walls and floors behind them unlit (there is no provision for light attenuating – getting dimmer over distance – things are either lit or unlit).<br /><br /> The observer also exists at a point. The problem is to quickly discover what he can see, where the area seen is defined as any point he has line of sight to (that is, those walls and floors not occluded by other walls), that are also lit. For efficiency reasons, the observer’s vision also has a maximum radius. Anything outside that radius is not visible, even if it is lit and unobstructed. The algorithm can function without this limit, the limitation is an optimization.<br /><br /> In addition to knowing what areas are currently visible, it is desirable to know what areas have been previously visible, and to be able to display those areas differently.<br /><br /> An example is given in Figure 1. The observer is the red hollow circle, and standing at the intersection of three passageways. White solid circles are light (the observer, in this case, is carrying one).<br /><br /> Dark green is used for walls and floors that have not been seen yet. Blue walls and medium grey floors mark areas that have been seen previously. White walls and light grey floor denote what’s currently visible. Note that while the observer’s own light doesn’t reach all the way down that west passageway, there’s a light to the south that illuminates part of it, giving a disjoint area of visibility. To the southeast, there’s another light that helps light that southeast corridor, so the whole corridor is visible. Finally, to the northeast, there’s a small window in the wall, allowing the observer to illuminate, and see, a small amount of the room to the east.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig01.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 1 - A simple example of a map</strong> In this technique, the floor is broken into small fragments (triangles in this case, because they are easy to render), which serve as the algorithm’s basic unit of area. When determining what part of the floor is visible, I’m really asking what set of triangles is visible. If the centroid (average of the vertices) of a triangle is visible, the whole triangle is considered visible. This has the side effect of making the bounds of the visibility area slightly jagged, as you can see in the disjoint area down the west corridor. In my own application, this is acceptable, but blending of adjacent triangles could be done in order to get a smooth gradient between visible and invisible areas.<br /><br /> Because walls divide up floor area and walls can run at all sorts of angles, cutting the floor into small triangles often results in additional, smaller, triangles being generated. All told, a map can contain millions of floor triangles and hundreds of walls. For the curious, figure 2 shows the triangles generated for a small section of the map:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig02.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 2 - You are in a maze of twisty triangles, all largely the same</strong> For lighting (and seeing) the walls themselves, I do something similar – walls are broken into segments, and the center point of each segment is checked to see if it is visible. If it is visible, that whole segment is visible. For this reason, segments are kept short and walls can generate thousands of them in a large map.<br /><br /> Because of this, when it comes time to determine which parts of walls and floors are not visible it may be necessary to evaluate millions of points for the floor and thousands of points for wall segments. Conceptually, they all need to be evaluated against every wall to determine if line of sight exists from the observer, and that process has to be repeated for each light as well.<br /><br /> Clearly, a brute force approach will not work in reasonable time. The goal is to move the observer to a new point, or move a light to a new point (often both, since the observer often carries a light), and know as quickly as possible what areas of floor and segments of walls can be seen. Comparing possibly millions of points against hundreds or thousands of walls and doing a line of sight calculation – essentially calculating the intersection of two line segments, for one for a line of vision and one for a wall – isn’t acceptably fast.<br /><br /> It turns out that lighting and vision can be handled by the same algorithm, since they are both occluded by walls in the same way. They can both be represented by casting rays out from a given point, and stopping the rays when they hit a wall. If there’s no wall along that ray, the ray is cut short by a distance limit instead (this illustrates another difficulty with using polygon intersections – polygons don’t have curved sides, and approximating them with short straight lines increases the cost of the intersection test).<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig03.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 3 - A "polygon" of light, with curved parts</strong> Since there can be multiple lights, unions of polygons would be required:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig04.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 4 - Union of two lit areas</strong> Since we’ve defined visibility as areas that are both lit and within line of sight of the observer, the intersection of polygons representing lit areas (itself a union) and the polygon representing the area of sight represent the visible areas. Figure 5 repeats the original example, with yellow lines roughly delineating the lit area, and red lines bounding the area of sight. Areas within both are the visible areas.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig05.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 5 - Vision and Light compute Visibility</strong> The result of the intersection is a (potentially empty) set of polygons. To maintain a history of what’s been visible, a union of the previously visible areas and the currently visible area is also required. The combination can create polygons with holes, and for complex maps, a large number of sides. Figure 6 shows the result of a wall, a number of square columns, and a short walk along the north side of the wall by the observer, carrying a light. The union of previously visible areas is shown in darker grey<sup class='bbc'>1</sup>.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig06.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 6 - Columns, a Wall, and A Messy Polygon Union</strong> Doing polygon union and intersection is complex. Naïve implementations of these algorithms run into problems with boundary conditions and, in complex cases, floating point accuracy. There are packages available that solve these problems, and deal elegantly with disjoint polygons and holes, but they are available under restricted license. I wanted an unencumbered solution, and was willing to trade off some amount of runtime to get it.<br /><br /> But a brute force computation of every floor point vs. every obstructing wall is unacceptable. What is needed is an efficient way to evaluate the many floor and wall points for visibility.<br /><br /> The basic approach amounts to describing the shadows cast by walls. Since each point in the floor has to be tested against these shadows, the algorithms focus on making it as inexpensive as possible to determine if any given point is within a shadow, without loss of accuracy. Since “as inexpensive as possible” is still too expensive, given the sheer number of points to consider, the approach also includes determining which walls are already covered by other walls (in effect, we work to discard walls which don’t change anything we care about.) Where walls cannot be discarded, the algorithm attempts to determine what parts of walls contribute to meaningful shadows, and which parts are irrelevant. Note that while I talk about shadows here, everything also applies to occluding the observer’s view; since, as noted, a wall stops a line of sight in exactly the same way that it cuts short a ray of light. Finally, I discuss the critical optimizations that make the approach fast enough to use on large and complex maps.<br /><br /> We start by making all points are relative to the location of the light in question; in other words everything is translated so that the light is at the origin. This translation drops terms out of many formulas and provides significant savings.<br /><br /> The first step is to identify when a wall casts a shadow over a point – efficiently. The simplest way to do this is to take the endpoints of a wall and arrange them so that the first point is clockwise from the second point, as seen from the origin. If they are counterclockwise, swap the points. If they are collinear with the light, throw out that wall, because it can’t cast a shadow. I will refer to the endpoints as the left and right endpoints, with the understanding that this is from the perspective of the origin.<br /><br /> A 2D cross product<sup class='bbc'>3</sup> (from the origin, to the start point, to the end point), reveals both the edge-on case and the clockwise or counterclockwise winding of the end points.<br /><br /> Here’s an example:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig07.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 7 - Walls that do, and don't, matter</strong> In figure 7, line C is edge on to the light and casts no shadow of its own, so it gets dropped. B isn’t edge on, so we keep it, with BA as its right endpoint and BC as the left (as seen from the light at the origin). D is also a wall that matters, and DC becomes the right endpoint, while DE becomes the left. In some applications, in which walls always form continuous loops, A and E can also be dropped because they represent hidden surfaces. The same rules that apply to 3D surface removal apply here – in a closed figure, walls that face away from the origin can be dropped without harm. However, this algorithm works even if walls don’t form closed figures.<br /><br /> Now, cast a ray out from the origin though the right endpoint of a given wall – use B as an example, and cast a ray through BA. Note that any point that is in shadow happens to be to the left of this line (so are many other points, but the point is that all the shadowed ones are). Calculating “to the left of” is cheap: it’s the 2D cross product from the origin, to the right endpoint of the wall, to the point in question; for example, point X in Figure 7. The cross product gives a value that’s negative on one side of the wall, positive on the other side, and zero if the point in question is on the line.<br /><br /> Repeat for the ray from the origin to the left end of the line segment, BC. All shadowed points are to the right of this line, which is determined by another 2D cross product. All that remains is to determine if the point is behind the wall or in front of the wall, with respect to the center. This is yet another 2D cross product, this time from the wall’s right endpoint, to the left end point, to the point in question. Point X in Figure 7 would pass all three of these tests, so it is in B’s shadow.<br /><br /> All told, at most three cross products (six multiplies and three subtracts, and three comparisons with 0), tell if a point is shadowed by a wall. In many cases, a single cross product will prove that a point is not shadowed by a wall. But that still leaves the problem of comparing many, many thousands of points against hundreds of walls.<br /><br /> Having established an algorithm to test a point against a wall, we now need to find ways to minimize how often we have to use it. Any wall we can cull results in hundreds of thousands of fewer operations! So a first pass at culling is simply to remove any wall which is outside the radius of the light by creating a bounding square around the origin with the “radius” of the light, and a bounding rectangle from each wall’s endpoints. If these don’t overlap, that wall can’t affect lighting, and is discarded.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig08.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 8 - Using rectangles and overlap to discard walls</strong> In Figure 8, F’s rectangle doesn’t overlap the light’s rectangle, so F gets discarded. H and G overlap and so are kept – H is a mistake because it’s not really in the <em class='bbc'>circle</em> of light that matters, but this is a very cheap test that discards most walls in a large map very quickly, and that’s what we want for now. In applications where walls form loops or tight groups, the entire set can be given a single bounding rectangle, allowing whole groups of walls to be culled by a single overlap test.<br /><br /> Whatever walls are left might cast shadows. For each, we calculate the squared distance between the origin and the nearest part of the wall. This is slightly messy, since “nearest” could be either endpoint, or a point somewhere between. Given these distances, sort the list of walls so that the closest rise to the top. In the case of ties, there is generally some advantage in letting the longer wall sort closer to the top. This will usually put the walls casting the largest shadows near the top of the list. This helps performance considerably, but nothing breaks if the list isn’t perfectly sorted. In figure 8, G would be judged closer, by virtue of the northernmost endpoint. H’s closest point, near the middle of H, is further off.<br /><br /> Once we’ve dropped all the obviously uninvolved walls and sorted the rest by distance, it’s time to walk through the list of walls, adding them to the (initially empty) set of walls that cast shadows. If a wall turns out to be occluded by other walls in this phase, we cull it. Usually, anyway - in the interest of speed, the algorithm settles for discarding most such walls, but can miss cases. In practice, it misses few cases, so I have not troubled to improve this phase’s culling method.<br /><br /> To explain how this culling is done, we must introduce some concepts. Each wall generates a pair of rays, both starting from the light (origin) and one through each endpoint. As noted before, points to the right of the “left” ray, and to the left of the “right” ray, bound the possible shadow. However, some of that area might be already shadowed by another wall – one wall can partially cover another. In fact, all of that area might be shadowed by other walls – the current wall might be totally occluded. If it’s only partially occluded, what we want to do is “narrow” its rays, pushing the left ray further right and/or the right ray further left, to account for the existing, known shadows in this area. The reason for this is that we don’t want to compare points against multiple walls if we don’t have to, and we have to compare a point against any wall if it lies between the wall’s left and right rays. The narrower that angle becomes, the fewer points have to be checked against that wall.<br /><br /> So when we take in a new wall, the first thing we do is look at the two line segments between the origin and the wall’s endpoints (each in turn). If that line segment intersects a wall we already accepted, then the intersected wall casts a shadow we probably care about in regard to the current wall. The interesting point here is that we don’t care where the intersection occurs. An example will show why:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig09.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 9 - Intersecting origin-to-endpoint with other walls</strong> Assume the current wall, W, is half-hidden by an already accepted, closer one, S. Assume that it’s the left half of W that’s covered by the nearer wall, as in the example above. The way we discover this is by checking the line segment from the origin to left endpoint of W, against the already-accepted walls. It can intersect several. Once we find an intersection, we know immediately that the intersected wall, S, is going to cover at least part of W, and on the left side (ignore the case where the S’s left endpoint and S’s right endpoint are collinear with the origin – it doesn’t change anything). Notice that we don’t care where S gets intersected.<br /><br /> So what do we do? We replace W’s left endpoint ray with S’s right endpoint ray. In effect, we push the left endpoint ray of W to the right. Having done that, we check to see if we’ve pushed it so far to the right that it is now at or past W’s own right ray. If so, S completely occludes W and we discard W immediately. If not, we’ve made W “narrower”. In our example, W survived and its shadow got narrower, as marked in grey.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig10.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 10 - Pushing W's left ray</strong> We keep doing this, looking for other walls to play the part of S that intersect W’s left or right end rays. When they do, we update (“nudge”) W’s left (or right) ray by replacing it with S’s right (or left) ray. If the endpoint rays of W meet or cross during this “nudging” process, W is discarded.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig11.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 11 - Losing W</strong> In the example above, S has pushed W’s left ray, and J has pushed W’s right ray. A 2D cross product tells us that left and right rays have gotten past each other in the process, so W is judged to be completely occluded, and gets dropped. Otherwise, it survives, with a potentially much narrowed shadow, and it adds it to the set of kept walls. Note that if J had been longer to the left, it could have intersected <em class='bbc'>both</em> W’s red lines, and it would have pushed both W’s left and right rays by itself, forcing them to cross. This makes sense; it would have completely occluded W all by itself and we’d expect it to cause W to drop out.<br /><br /> Note that this algorithm doesn’t notice the case where a short, close wall casts a shadow over the middle of a long wall a little further off. In this case, both walls end up passing this check, and the long wall doesn’t get its rays changed (the only way to do that would be to split the long wall into two pieces and narrow them individually). This isn’t much of a problem in practice, because when it comes time to check points, we will again check the walls in order of distance from the origin, so the smaller, closer wall is likely to be checked first. Points it shadows won’t have to be checked again, so for those points the longer wall never needs to be checked at all. There are unusual cases where points do end up in redundant checks, but they are unusual enough not to be much of a runtime problem.<br /><br /> As we work through the list of candidates, we are generally working outward from the origin, so it’s not uncommon for more and more walls to end up discarded because they are completely occluded. This helps keep this part of the algorithm quick.<br /><br /> It remains to find a good way to detect intersections of line segments. We don’t want round-off problems (this might report false intersections or miss real ones, causing annoying issues), and we don’t care where the intersection itself actually occurs. It turns out that a reasonable way to do this is to take W’s two points, and each potential S’s two points, and arrange them into a quadrilateral. We take the 4 points in this order: S’s left, W’s left, S’s right, W’s right. If the line segments cross, the four points in order form a convex polygon. If they don’t, it is concave. An example serves:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig12.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 12 - Detecting intersections</strong> R and S cross, so the (green) polygon formed by the 4 endpoints, is a convex kite shape. R and L don’t cross, so the resulting (orange) polygon isn’t convex; it’s not even simple.<br /><br /> It turns out that there is a fair amount of misinformation about testing for convex polygons on the ‘Net. Talk about counting changes in sign sounds interesting (and cheap), but I’ve yet to see an implementation of this that works in all cases, including horizontal and vertical lines. What I’ve ended up with is more expensive but gets all cases, even if two points in the quad happen to be coincident. I calculate the 4 2D cross products, going around the quad (in either order). If they are all negative OR they are all positive, it’s convex. Anything else is concave or worse. While not cheap (up to 8 multiplies and quite a few subtracts), we can stop as soon as we get a difference in sign. On hardware that can do floating subtracts in parallel, this is not too bad in cost.<br /><br /> By itself, that’s enough to discard unneeded walls in most cases, and minimize the scope of influence of the surviving walls. Just with what we have, it’s possible to determine if points are occluded by walls. But we’d still like it faster. We always want things faster, that’s why we buy computers.<br /><br /> <br /><strong class='bbc'>Making it faster</strong><br /> Make sure all we just discussed makes sense to you, because we’re about to add some complications. There are four optimizations that can be applied to all this, unrelated to each other. 1. In my maps, walls (except doors) always touch one other wall at their endpoint. (They are really wall surfaces - just as in a 3D model, all the surface polygons are just that, surfaces, always touching neighbor surfaces along edges.) This leads to an optimization, though it is a little fussy to apply. An example serves best.<br /><br /> Imagine you’re a light at the origin, and over at x=5 there’s a wall stretching from (5,0) to (5,5), running north and south. It casts an obvious shadow to the east. We’ll call that wall A. But imagine that at (5,5) there’s another wall, running to (4,6), diagonally to the northwest. Call it B.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig13.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 13 - Extending A's influence</strong> A has the usual left and right rays: the right ray passes through (0,5), the left through (5.5). B has its own rays, with a right ray at (5,5) and a left ray at (4.6).<br /><br /> Between them, they cast a single, joined shadow, wider than the shadows they cast individually. The shape of the shadow is complicated, but it’s worth noticing that for any point behind the line of A (that is, with x &gt; 5, noted in green), that point is in shadow if it is between A’s right ray and <em class='bbc'>B’s</em> left ray. This is because B extends A, and (important point) it extends it by turning somewhat towards the light, not away. (It would also work if A and B were collinear, but in my system, collinear walls that share an endpoint become a single wall). There are also points B shadows that have X &lt; 5, but when we are just considering A, it’s fair to say that we can apply B’s left bound instead of A’s left bound when asking about points that are behind the line A makes. A’s ability to screen things, given in light grey, has effectively been extended by the dark grey area.<br /><br /> I don’t take advantage of this when it comes to considering which points are in shadow, because all it does is increase the number of points that are candidates for testing for any given wall, and that doesn’t help. However, I do take advantage of this when determining what walls occlude other walls.<br /><br /> I do this by keeping <em class='bbc'>two</em> sets of vector pairs for each wall. The ones I’ve been calling left and right are the “inner” vectors, named because they tend to move inward, and their goal is to get pushed closer together by other walls, ideally to cross. But there is also a pair of left and right vectors I call the outer pair. They start at the endpoints of the wall like the inner ones do, but they try to grow outward. They grow outward when 1) I find the wall that shares an endpoint and 2) this other wall (in my scheme there can only be one) does not bend away from the light. This is an easy check to make – it’s another 2D cross product, from A’s left endpoint, to A’s right, to B’s right. If that comes up counterclockwise, A’s outer right vector gets replaced by a copy of B’s outer right vector (as long as that’s an improvement – it’s important to check that you’re really pushing the outer right vector more to the right.)<br /><br /> And note the trick affects both A and B. If B extends A’s right outer vector, then A is a great candidate for extending B’s left outer vector.<br /><br /> Applied carefully, the extra reach this gives walls helps discard distant walls very quickly in cases where there are long runs of joined walls. I find that for most maps, the difference this makes is not large, and given the work I put into getting it right, I might not have done it if I’d realized how little it helps most maps.<br /><br /> 2. When considering points, it’s important not to waste time testing any given point against a wall that can’t possibly shadow it. Each point, after all, has three tests it has to pass, per wall. Is it to the right of the left vector, to the left of the right vector, and is it behind the line of the wall. On average, half of all points are going to pass that first test, for most walls. That means that a fair amount of the time, the second test is going to be needed for points that are not remotely candidates. And given huge numbers of points, that’s unacceptable.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig14.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 14 - The futility of any one test</strong> Here, P is to the right of the left endpoint-origin line, so it’s a candidate for being shadowed by the wall. But so are Q and R, and they clearly aren’t going to be shadowed. It would be helpful, then, if we could only test a point against the walls that have a good shot of shadowing it.<br /><br /> Trigonometric solutions suggest themselves, but trig functions are much too expensive.<br /><br /> What I do is create a list of “aggregate wedges.” Each wall’s shadow, called a wedge, is compared with the other wedges of the other walls. If they overlap, they are added to the same set of wedges, and I keep track of the leftmost and rightmost ray among everything in the same set.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig15.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 15 - Creating sets of shaodw wedges</strong> Of course, if you’re in the square room without a door (never mind how you got in), you end up with all the walls in the same set, and the rays that bound the set end up “enclosing” every point on the map! So this trick is useless in these kinds of maps. But in maps of towns, with many freestanding buildings and hence many independent wedges, you can often get whole groups of walls into a number of disjoint sets, and since each set has an enclosing pair of rays that covers all the walls in the set, you can test any given point against the set’s rays: if it’s not between them, you don’t have to test any of the walls in that set.<br /><br /> This sounds pretty, but it can be maddening to get right. You can have two independent sets, and then find a wall that belongs in them both, effectively merging two sets into one big one.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig16.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 16 - Combining sets</strong> You end up doing a certain amount of set management. I have a sloppy and dirty way to do this which is reasonably fast, but it’s not pretty.<br /><br /> Another difficulty is knowing when a wall’s vectors overlap an existing set’s bounds. There are several cases. A wall’s vectors could be completely inside the set’s bounds, in which case the wall just becomes a member of the set and the set’s bounds don’t change. Or it can overlap to the left, joining the set and pushing the set’s left vector. Or it can overlap on the right. Or it can overlap on both sides, pushing both set vectors outward. Or it can be disjoint to that set. Keep in mind that a set can have rays at an obtuse angle, enclosing quite a bit of area. It’s <em class='bbc'>surprisingly</em> hard to diagnose all these cases properly. The algorithm is difficult to describe, but the source code is at the end of this document.<br /><br /> When all is said and done, this optimization is very worthwhile for some classes of maps that would otherwise hammer the point test algorithm with a lot of pointless tests. But it was not much fun to write.<br /><br /> 3. This is my favorite optimization because it’s simple and it tends to help a great deal in certain, specific cases. When I’m gathering walls, I’m computing distance (actually, distance squared – no square roots were used in any of this) between the light and each wall. Along the way I keep track of the minimum such distance I see. If, for example, the closest distance to any wall is 50 units, then any point that is closer to the origin than 50 units can never be in shadow and doesn’t have to be checked against any wall. (Of course, if the light’s actual radius of effect is smaller than this distance, that value is used instead).<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig17.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 17 - Identifying points unaffected by walls</strong> If the light is close to a wall, this optimization saves very little. If it’s not, this can put hundreds or even thousands of points into “lit” status very quickly indeed. In order to avoid edge cases, I subtract a small amount from the minimum wall distance after I compute it, so there’s no question of points right at a wall being considered lit unduly.<br /><br /> 4. This is my second favorite optimization because it’s simple, dirty and very effective. When I generate floor triangles, I use an algorithm that more or less sweeps though in strips. Because triangles are small, and generated such some neighboring triangles are adjacent in the list of triangles, odds are often very high that if a triangle is shadowed by a wall, the next triangle to consider is going to be shadowed by the <em class='bbc'>same</em> wall.<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig18.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 18 - Neighbors often suffer the same fate</strong> So the best optimization of all is to remember which wall last shadowed a triangle, and test the next triangle against that one first, always. After all, if a triangle is shadowed by any wall, it doesn’t have to be tested against any other; we don’t care about finding the “best” shadow, we just want to quickly find one that works. If the light happens to be close to a wall (which ruins optimization 3), this one can be very powerful. W, here, is likely to shadow about half the map.<br /><br /> One final trick – not exactly an optimization – has to do with the fact that this code runs on a dual core processor. I cut the triangle list in about half and give each half to a separate thread to run though. (Each thread has its own copy of the pointer used in optimization 4, so they don’t interfere with each other in any way.) This trick doesn’t always cut the runtime in half – it’s not uncommon for one thread to get saddled with most or all of the cases that require more checks – but it helps. Other speedups involve not using STL or boost, and sticking to old fashion arrays of data structures – heresy in some C++ circles, but the speed gains are worth any purist’s complaints.<br /><br /> What’s left is trivial. Each floor triangle, and short wall segment, have a set of bits, one for each possible light, and one for the viewer. If the object is not in shadow, the appropriate light’s bit is set in that object. If any such bit is set, the object is lit. There is also a bit for the observer, which as noted uses the exact same algorithm. If the object is lit, that algorithm is then run for that point for the observer, and if it comes up un-occluded, it is marked visible (and also marked “was once visible”, because I need to keep a history of what has already been seen). Moving a light is a matter of clearing all bits that correspond to that light in all the objects, and then “recasting” the light from its new location. In my application, most lights don’t move frequently, so not much of that happens.<br /><br /> My favorite acid test at the moment is an open town map with 3 million floor triangles and almost 7000 walls. With ambient light turned on (which means everything is automatically considered lit, so everything has to be checked for “visible to observer”), and a vision limit of 400 units (so about 336,000 possible triangles in range), my worst case compute times are about 0.75 seconds on a dual core 2Ghz Intel processor. Typical times are a more acceptable 0.4 sec or so. Kinder maps (underground caves, tight packed cities, open fields with few obstructions) manage times of well under 0.1 sec.<br /><br /> Some examples of my implementation:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig19.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 19 - Room in underground city</strong> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig20.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 20 - Ambient light, buildings and rock outcrops</strong> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/wallshadow2d/fig21.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 21 - Limited lights, windows and doors</strong> Notice that lights in buildings shine out windows, and also enable peeks inside the building from outside, forming a number of disjoint areas of visibility.<br /><br />&nbsp;&nbsp;<sup class='bbc'>1</sup>To give a sense of the algorithm’s performance, computing the currently visible area in figure 5, and combining it with the previous area, took 0.003 seconds on a dual core 2Ghz processor. However, keep mind that figure 5 represents very small and simple map (containing less than 100,000 triangles).<br />&lt;a name="cite2" id="cite2"&gt;<sup class='bbc'>2</sup>GPC, PolyBoolean and LEDA are among these packages. Some discussion of the problems of runtime, runspace, and accuracy can be found at <a href='http://www.complex-a5.ru/polyboolean/comp.html' class='bbc_url' title='External link' rel='nofollow external'>http://www.complex-a5.ru/polyboolean/comp.html</a>. Boost’s GTL is promising, but it forces the user of integers for coordinates, and the implementation is still evolving.<br /><sup class='bbc'>3</sup>Formally speaking, there isn’t a cross product defined in 2D; they only work in 3 or 7 dimensions. What’s referred to here is the z value computed as part of a 3D cross product, according to u.x * v.y - v.x * u.y.&nbsp;&nbsp;<br /><strong class='bbc'>Code Listing</strong><br /> What follows gives the general sense of the algorithms’ implementation. Do note that the code is not compiler-ready: support classes like Point, Wall, WallSeg and SimpleTriangle are not provided, but their implementation is reasonably obvious.<br /><br /> This code is released freely and for any purpose, commercial or private – it’s free and I don’t care what happens, nor do I need to be told. It is also without any warranty or promise of fitness, obviously. It works in my application as far as I know, and with some adjustment, may work in yours. The comments will show some of the battles that occurred in getting it to work. The code may contain optimizations I didn’t discuss above.<br /><br />&nbsp;&nbsp;enum Clockness {Straight, Clockwise, Counterclockwise}; enum Facing {Colinear, Inside, Outside}; static inline Clockness clocknessOrigin(const Point& p1, const Point& P2) { 	const float a = p1.x * P2.y - P2.x * p1.y; 	if (a &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return Counterclockwise; // aka left 	if (a &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return Clockwise; 	return Straight; } static inline bool clocknessOriginIsCounterClockwise(const Point& p1, const Point& P2) { 	return p1.x * P2.y - P2.x * p1.y &gt; 0; } static inline bool clocknessOriginIsClockwise(const Point& p1, const Point& P2) { 	return p1.x * P2.y - P2.x * p1.y &lt; 0; } class LineSegment { public: 	Point begin_; 	Point vector_; //begin_+vector_ is the end point 	inline LineSegment(const Point& begin, const Point& end)&nbsp;&nbsp;&nbsp;&nbsp; 	: begin_(begin), vector_(end - begin) {} 	inline const Point& begin() const {return begin_;} 	inline Point end() const {return begin_ + vector_;} 	inline LineSegment(){} 	//We don't care *where* they intersect and we want to avoid divides and round off surprises. 	//So we don't attempt to solve the equations and check bounds. 	//We form a quadilateral with AB and CD, in ACBD order. This is a convex kite shape if the 	// segments cross. Anything else isn't a convex shape. If endpoints touch, we get a triangle, 	//which will be declared convex, which works for us. 	//Tripe about changes in sign in deltas at vertex didn't work. 	//life improves if a faster way is found to do this, but it has to be accurate. 	bool doTheyIntersect(const LineSegment &m) const 	{&nbsp;&nbsp; 	Point p[4];&nbsp;&nbsp; 	p[0] = begin();&nbsp;&nbsp; 	p[1] = m.begin();&nbsp;&nbsp; 	p[2] = end();&nbsp;&nbsp; 	p[3] = m.end();&nbsp;&nbsp; 	unsigned char flag = 0;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float z&nbsp;&nbsp;= (p[1].x - p[0].x) * (p[2].y - p[1].y) -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	(p[1].y - p[0].y) * (p[2].x - p[1].x);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (z &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag = 2;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else if (z &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag = 1;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float z&nbsp;&nbsp;= (p[2].x - p[1].x) * (p[3].y - p[2].y) -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	(p[2].y - p[1].y) * (p[3].x - p[2].x);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (z &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 2;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else if (z &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 1;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (flag == 3)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float z&nbsp;&nbsp;= (p[3].x - p[2].x) * (p[0].y - p[3].y) -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	(p[3].y - p[2].y) * (p[0].x - p[3].x);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (z &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 2;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else if (z &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 1;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (flag == 3)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float z&nbsp;&nbsp;= (p[0].x - p[3].x) * (p[1].y - p[0].y) -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	(p[0].y - p[3].y) * (p[1].x - p[0].x);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (z &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 2;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else if (z &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	flag |= 1;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return flag != 3;&nbsp;&nbsp;&nbsp;&nbsp;} 	inline void set(const Point& begin, const Point& end) 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin_ = begin;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;vector_ = end - begin; 	} 	inline void setOriginAndVector(const Point& begin, const Point& v) 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin_ = begin;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;vector_ = v; 	} 	/* 	Given this Line, starting from begin_ and moving towards end, then turning towards 	P2, is the turn clockwise, counterclockwise, or straight? 	Note: for a counterclockwise polygon of which this segment is a side, 	Clockwise means P2 would "light the outer side" and 	Counterclockwise means P2 would "light the inner side". 	Straight means colinear. 	*/&nbsp;&nbsp;&nbsp;&nbsp;inline Clockness clockness(const Point& P2) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;const float a = vector_.x * (P2.y - begin_.y) - (P2.x - begin_.x) * vector_.y;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (a &gt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return Counterclockwise; // aka left&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (a &lt; 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return Clockwise;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return Straight;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline bool clocknessIsClockwise(const Point& P2) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return vector_.x * (P2.y - begin_.y) - (P2.x - begin_.x) * vector_.y &lt; 0;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;//relative to origin&nbsp;&nbsp;&nbsp;&nbsp;inline Clockness myClockness() const {return clocknessOrigin(begin(), end());}&nbsp;&nbsp;&nbsp;&nbsp;inline bool clockOK() const {return myClockness() == Counterclockwise;}&nbsp;&nbsp;&nbsp;&nbsp;//is clockOK(), this is true if p and center are on opposide sides of me 	//if p is on the line, this returns false&nbsp;&nbsp;&nbsp;&nbsp;inline bool outside(const Point p) const 	{&nbsp;&nbsp; 	return clockness(p) == Clockwise;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline bool outsideOrColinear(const Point p) const 	{&nbsp;&nbsp; 	return clockness(p) != Counterclockwise;&nbsp;&nbsp;&nbsp;&nbsp;} 	void print() const&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin().print(); printf(" to "); end().print(); 	} }; class Wall; /* A wedge is a line segment that denotes a wall, and two rays from the center, that denote the relevant left and right bound that matter when looking at this wall. Initially, the left and right bound are set from the wall's endpoints, as those are the edges of the shadow. But walls in front of (eg, centerward) of this wall might occlude the end points, and we detect that when we add this wedge. If it happens, we use the occluding wall's endpoints to nudge our own shadow rays. The idea is to&nbsp;&nbsp;minimise the shadow bounds of any given wall by cutting away areas that are already&nbsp;&nbsp;occluded by closer walls. That way, a given point to test can often avoid being tested against multiple, overlapping areas. More important, if we nudge the effective left and right rays for this wall until they meet or pass each other, that means this wall is completely occluded, and we can discard it entirely, which is the holy grail of this code. Fewer walls means faster code. For any point that's between the effective left and right rays of a given wall, the next question is if it's behind the wall. If it is, it's definitively occluded and we don't need to test it any more. Otherwise, on to the next wall. */ class AggregateWedge; enum VectorComparison {ColinearWithVector, RightOfVector, OppositeVector, LeftOfVector}; static VectorComparison compareVectors(const Point& reference, const Point& point) {&nbsp;&nbsp;&nbsp;&nbsp;switch (clocknessOrigin(reference, point))&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;case Clockwise:&nbsp;&nbsp; 	return RightOfVector;&nbsp;&nbsp;&nbsp;&nbsp;case Counterclockwise:&nbsp;&nbsp; 	return LeftOfVector;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;if (reference.dot(point) &gt; 0)&nbsp;&nbsp; 	return ColinearWithVector;&nbsp;&nbsp;&nbsp;&nbsp;return OppositeVector; } class LittleTree { public:&nbsp;&nbsp;&nbsp;&nbsp;enum WhichVec {Left2, Right1, Right2} whichVector; //(sort tie-breaker), right must be greater than left&nbsp;&nbsp;&nbsp;&nbsp;const Point*&nbsp;&nbsp; 	position;&nbsp;&nbsp;//vector end&nbsp;&nbsp;&nbsp;&nbsp;LittleTree*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;greater; //that is, further around to the right&nbsp;&nbsp;&nbsp;&nbsp;LittleTree*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lesser; //that is, less far around to the right&nbsp;&nbsp;&nbsp;&nbsp;LittleTree() {greater = lesser = NULL;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;void readTree(WhichVec* at, int* ip)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (lesser)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lesser-&gt;readTree(at, ip);&nbsp;&nbsp; 	at[*ip] = whichVector;&nbsp;&nbsp; 	++*ip;&nbsp;&nbsp; 	if (greater)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;greater-&gt;readTree(at, ip);&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;//walk the tree in order, filling an array&nbsp;&nbsp;&nbsp;&nbsp;void readTree(WhichVec* at)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	int i = 0;&nbsp;&nbsp; 	readTree(at, &i);&nbsp;&nbsp;&nbsp;&nbsp;} }; class VectorPair { public:&nbsp;&nbsp;&nbsp;&nbsp;Point&nbsp;&nbsp;&nbsp;&nbsp;leftVector;&nbsp;&nbsp;&nbsp;&nbsp;Point&nbsp;&nbsp;&nbsp;&nbsp;rightVector;&nbsp;&nbsp;&nbsp;&nbsp;bool 	acute;&nbsp;&nbsp;&nbsp;&nbsp;VectorPair() {}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;VectorPair(const Point& left, const Point& right)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	leftVector = left;&nbsp;&nbsp; 	rightVector = right;&nbsp;&nbsp; 	acute = true;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;bool isAllEncompassing() const {return leftVector.x == 0 && leftVector.y == 0;}&nbsp;&nbsp;&nbsp;&nbsp;void set(const Point& left, const Point& right)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	leftVector = left;&nbsp;&nbsp; 	rightVector = right;&nbsp;&nbsp; 	acute = clocknessOrigin(leftVector, rightVector) == Clockwise;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;void setKnownAcute(const Point& left, const Point& right)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	leftVector = left;&nbsp;&nbsp; 	rightVector = right;&nbsp;&nbsp; 	acute = true;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;void setAllEncompassing()&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	acute = false;&nbsp;&nbsp; 	leftVector = rightVector = Point(0,0);&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;bool isIn(const Point p) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (acute)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return clocknessOrigin( leftVector, p) != Counterclockwise &&&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	clocknessOrigin(rightVector, p) != Clockwise;&nbsp;&nbsp; 	//this accepts all points if leftVector == 0,0&nbsp;&nbsp; 	return clocknessOrigin( leftVector, p) != Counterclockwise ||&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;clocknessOrigin(rightVector, p) != Clockwise;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;//true if we adopted the pair into ourselves. False if disjoint.&nbsp;&nbsp;&nbsp;&nbsp;bool update(const VectorPair& v)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	/* I might completely enclose him - that means no change&nbsp;&nbsp; 	I might be completely disjoint - that means no change, but work elsewhere&nbsp;&nbsp; 	He might enclose all of me - I take on his bounds&nbsp;&nbsp; 	We could overlap; I take on some of his bounds&nbsp;&nbsp; 	--&nbsp;&nbsp; 	We figure this by starting at L1 and moving clockwise, hitting (in some&nbsp;&nbsp; 	order) R2, L2 and R1. Those 3 can appear in any order as we move&nbsp;&nbsp; 	clockwise, and some can be colinear (in which case, we pretend a convenient order).&nbsp;&nbsp; 	Where L1 and R1 are the bounds we want to update, we have 6 cases:&nbsp;&nbsp; 	L1 L2 R1 R2 - new bounds are L1 R2 (ie, update our R)&nbsp;&nbsp; 	L1 L2 R2 R1 - no change, L1 R1 already encloses L2 R2&nbsp;&nbsp; 	L1 R1 L2 R2 - the pairs are disjoint, no change, but a new pair has to be managed&nbsp;&nbsp; 	L1 R1 R2 L2 - new bounds are L2 R2; it swallowed us (update both)&nbsp;&nbsp; 	L1 R2 L2 R1 - all encompassing; set bounds both to 0,0&nbsp;&nbsp; 	L1 R2 R1 L2 - new bounds are L2 R1 (ie, update our L)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	If any two rays are colinear, sort them so that left comes first, then right.&nbsp;&nbsp; 	If 2 lefts or 2 rights, order doesn't really matter. The left/right case does because&nbsp;&nbsp; 	we want L1 R1 L2 R2, where R1=L2, to be processed as L1 L2 R1 R2 (update R, not disjoint)&nbsp;&nbsp; 	*/&nbsp;&nbsp; 	//special cases - if we already have the whole circle, update doesn't do anything&nbsp;&nbsp; 	if (isAllEncompassing())&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //v is part of this wedge (everything is)&nbsp;&nbsp; 	//if we're being updated by a full circle...&nbsp;&nbsp; 	if (v.isAllEncompassing())&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setAllEncompassing(); //we become one&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	/*&nbsp;&nbsp; 	Now we just need to identify which order the 3 other lines are in, relative to L1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Not so easy since we don't want to resort to arctan or anything else that risks&nbsp;&nbsp; 	any roundoff. But clockness from L1 puts them either Clockwise (sooner), or&nbsp;&nbsp; 	Straight (use dot product to see if same as L1 or after Clockwise), or&nbsp;&nbsp; 	CounterClockwise (later). Within that, we can use clockness between points to sort between them.&nbsp;&nbsp; 	*/&nbsp;&nbsp; 	//get the points R1, L2 and R2 listed so we can sort them by how far around to the right&nbsp;&nbsp; 	// they are from L1&nbsp;&nbsp; 	LittleTree list[3];&nbsp;&nbsp; 	//order we add them in here doesn't matter&nbsp;&nbsp; 	list[0].whichVector = LittleTree::Right1;&nbsp;&nbsp; 	list[0].position = &this-&gt;rightVector;&nbsp;&nbsp; 	list[1].whichVector = LittleTree::Left2;&nbsp;&nbsp; 	list[1].position = &v.leftVector;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	list[2].whichVector = LittleTree::Right2;&nbsp;&nbsp; 	list[2].position = &v.rightVector;&nbsp;&nbsp; 	//[0] will be top of tree; add in 1 & 2 under it somewhere&nbsp;&nbsp; 	for (int i = 1; i &lt; 3; ++i)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LittleTree* at = &list[0];&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;do {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	bool IisGreater = list[i].whichVector &gt; at-&gt;whichVector; //default if nothing else works&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	VectorComparison L1ToAt = compareVectors(leftVector, *at-&gt;position);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	VectorComparison L1ToI = compareVectors(leftVector, *list[i].position);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (L1ToI &lt; L1ToAt)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;IisGreater = false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	else if (L1ToI &gt; L1ToAt)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;IisGreater = true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (L1ToI != OppositeVector && L1ToI != ColinearWithVector)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//they are in the same general half circle, so this works&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	switch (clocknessOrigin(*at-&gt;position, *list[i].position))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	case Clockwise: IisGreater = true; break;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	case Counterclockwise: IisGreater = false; break;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//now we know where [i] goes (unless something else is there)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (IisGreater)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (at-&gt;greater == NULL)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	at-&gt;greater = &list[i];&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	break; //done searching for [I]'s place&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at = at-&gt;greater;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;continue;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (at-&gt;lesser == NULL)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at-&gt;lesser = &list[i];&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;break; //done searching for [I]'s place&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	at = at-&gt;lesser;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	continue;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} while (true);&nbsp;&nbsp; 	}&nbsp;&nbsp; 	//we have a tree with proper order. Read out the vector ids&nbsp;&nbsp; 	LittleTree::WhichVec sortedList[3];&nbsp;&nbsp; 	list[0].readTree(sortedList);&nbsp;&nbsp; 	unsigned int caseId = (sortedList[0] &lt;&lt; 2) | sortedList[1]; //form ids into a key. Two is enough to be unique&nbsp;&nbsp; 	switch (caseId)&nbsp;&nbsp; 	{&nbsp;&nbsp; 	case (LittleTree::Left2 &lt;&lt; 2) | LittleTree::Right2: //L1 L2 R2 R1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //no change, we just adopt it&nbsp;&nbsp; 	case (LittleTree::Right1 &lt;&lt; 2) | LittleTree::Left2: //L1 R1 L2 R2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //disjoint!&nbsp;&nbsp; 	case (LittleTree::Right1 &lt;&lt; 2) | LittleTree::Right2: //L1 R1 R2 L2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*this = v;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //we take on his bounds&nbsp;&nbsp; 	case (LittleTree::Right2 &lt;&lt; 2) | LittleTree::Left2: //L1 R2 L2 R1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setAllEncompassing();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //now we have everything&nbsp;&nbsp; 	case (LittleTree::Left2 &lt;&lt; 2) | LittleTree::Right1: //L1 L2 R1 R2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rightVector = v.rightVector;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;break;&nbsp;&nbsp; 	default: //(LittleTree::Right2 &lt;&lt; 2) | LittleTree::Right1: //L1 R2 R1 L2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;leftVector = v.leftVector;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;break;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	//we need to fix acute&nbsp;&nbsp; 	acute = clocknessOrigin(leftVector, rightVector) == Clockwise;&nbsp;&nbsp; 	return true;&nbsp;&nbsp;&nbsp;&nbsp;} }; class Wedge { public:&nbsp;&nbsp;&nbsp;&nbsp;//all points relative to center&nbsp;&nbsp;&nbsp;&nbsp;LineSegment wall; //begin is the clockwise, right hand direction&nbsp;&nbsp;&nbsp;&nbsp;Point&nbsp;&nbsp;&nbsp;&nbsp;leftSideVector;&nbsp;&nbsp;//ray from center to this defines left or "end" side&nbsp;&nbsp;&nbsp;&nbsp;Wedge*&nbsp;&nbsp; leftSidePoker; //if I'm updated, who did it&nbsp;&nbsp;&nbsp;&nbsp;Point&nbsp;&nbsp;&nbsp;&nbsp;rightSideVector;&nbsp;&nbsp;//ray from center to this defines left or "end" side&nbsp;&nbsp;&nbsp;&nbsp;Wedge*&nbsp;&nbsp; rightSidePoker;&nbsp;&nbsp;//if I'm updated, who did it&nbsp;&nbsp;&nbsp;&nbsp;Wall*&nbsp;&nbsp;&nbsp;&nbsp;source;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//original Wall of .wall&nbsp;&nbsp;&nbsp;&nbsp;VectorPair&nbsp;&nbsp;outVectors;&nbsp;&nbsp;&nbsp;&nbsp;float&nbsp;&nbsp; 	nearestDistance;&nbsp;&nbsp;//how close wall gets to origin (squared)&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedge* myAggregate;&nbsp;&nbsp;//what am I part of?&nbsp;&nbsp;&nbsp;&nbsp;inline Wedge(): source(NULL), leftSidePoker(NULL), rightSidePoker(NULL), myAggregate(NULL) {}&nbsp;&nbsp;&nbsp;&nbsp;void setInitialVectors()&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	leftSidePoker = rightSidePoker = NULL;&nbsp;&nbsp; 	rightSideVector = wall.begin();&nbsp;&nbsp; 	leftSideVector = wall.end();&nbsp;&nbsp; 	outVectors.setKnownAcute(wall.end(), wall.begin());&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline bool testOccluded(const Point p, const float distSq) const //relative to center&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (distSq &lt; nearestDistance)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //it cannot&nbsp;&nbsp; 	if (clocknessOriginIsCounterClockwise(leftSideVector, p))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //not mine&nbsp;&nbsp; 	if (clocknessOriginIsClockwise(rightSideVector, p))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //not mine&nbsp;&nbsp; 	return wall.outside(p); //on the outside 	}&nbsp;&nbsp;&nbsp;&nbsp;inline bool testOccludedOuter(const Point p, const float distSq) const //relative to center&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (distSq &lt; nearestDistance) //this helps a surprising amount in at least Enya&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //it cannot&nbsp;&nbsp; 	return wall.outside(p) && outVectors.isIn(p); //on the outside 	}&nbsp;&nbsp;&nbsp;&nbsp;inline bool nudgeLeftVector(Wedge* wedge)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	/*&nbsp;&nbsp; 	So. wedge occludes at least part of my wall, on the left side.&nbsp;&nbsp; 	It might actually be the case of an adjacent wall to my left. If so,&nbsp;&nbsp; 	my end() is his begin(). And if so, I can change HIS rightSideVectorOut&nbsp;&nbsp; 	to my right (begin) point, assuming my begin point is forward of (or on)&nbsp;&nbsp; 	his wall. That means he can help kill other walls better.&nbsp;&nbsp; 	*/&nbsp;&nbsp; 	if (wedge-&gt;wall.begin() == wall.end() && !wedge-&gt;wall.outside(wall.begin())) //is it legal?&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;outVectors.update(VectorPair(wedge-&gt;wall.end(), wedge-&gt;wall.begin()));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;wedge-&gt;outVectors.update(VectorPair(wall.end(), wall.begin()));&nbsp;&nbsp; 	}&nbsp;&nbsp; 	//turning this on drives the final wedge down, but not very much&nbsp;&nbsp; 	bool okToDoOut = true;&nbsp;&nbsp; 	bool improved = false;&nbsp;&nbsp; 	do {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wall.outside(wedge-&gt;wall.begin()))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	break; //illegal move, stop here&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (clocknessOrigin(leftSideVector, wedge-&gt;wall.begin()) == Clockwise)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	leftSideVector = wedge-&gt;wall.begin();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	leftSidePoker = wedge;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	improved = true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (okToDoOut)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	okToDoOut = !wall.outside(wedge-&gt;wall.end());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (okToDoOut)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;outVectors.update(VectorPair(wedge-&gt;wall.end(), wall.begin()));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;wedge = wedge-&gt;rightSidePoker;&nbsp;&nbsp; 	} while (wedge);&nbsp;&nbsp; 	return improved;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inline bool nudgeRightVector(Wedge* wedge)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	/*&nbsp;&nbsp; 	So. wedge occludes at least part of my wall, on the right side.&nbsp;&nbsp; 	It might actually be the case of an adjacent wall to my right. If so,&nbsp;&nbsp; 	my begin() is his end(). And if so, I can change HIS leftSideVectorOut&nbsp;&nbsp; 	to my left (end() point, assuming my begin point is forward of (or on)&nbsp;&nbsp; 	his wall. That means he can help kill other walls better.&nbsp;&nbsp; 	*/&nbsp;&nbsp; 	if (wedge-&gt;wall.end() == wall.begin() && !wedge-&gt;wall.outside(wall.end())) //is it legal?&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;outVectors.update(VectorPair(wedge-&gt;wall.end(), wedge-&gt;wall.begin()));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;wedge-&gt;outVectors.update(VectorPair(wall.end(), wall.begin()));&nbsp;&nbsp; 	}&nbsp;&nbsp; 	//turning this on drives the final wedge count down, but not very much&nbsp;&nbsp; 	bool okToDoOut = true;&nbsp;&nbsp; 	bool improved = false;&nbsp;&nbsp; 	do {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wall.outside(wedge-&gt;wall.end()))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return improved; //illegal move&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (clocknessOrigin(rightSideVector, wedge-&gt;wall.end()) == Counterclockwise)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	rightSideVector = wedge-&gt;wall.end();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	rightSidePoker = wedge;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	improved = true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (okToDoOut)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	okToDoOut = !wall.outside(wedge-&gt;wall.begin());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (okToDoOut)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;outVectors.update(VectorPair(wall.end(), wedge-&gt;wall.begin()));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;wedge = wedge-&gt;leftSidePoker;&nbsp;&nbsp; 	} while (wedge);&nbsp;&nbsp; 	return improved;&nbsp;&nbsp;&nbsp;&nbsp;} }; class AggregateWedge { public:&nbsp;&nbsp;&nbsp;&nbsp;VectorPair&nbsp;&nbsp;vectors;&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedge*&nbsp;&nbsp; nowOwnedBy;&nbsp;&nbsp;&nbsp;&nbsp;bool&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	dead;&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedge() : nowOwnedBy(NULL), dead(false) {}&nbsp;&nbsp;&nbsp;&nbsp;bool isIn(const Point& p) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	return vectors.isIn(p);&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;bool isAllEncompassing() const {return vectors.leftVector.x == 0 && vectors.leftVector.y == 0;}&nbsp;&nbsp;&nbsp;&nbsp;void init(Wedge* w)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	vectors.setKnownAcute(w-&gt;leftSideVector, w-&gt;rightSideVector);&nbsp;&nbsp; 	w-&gt;myAggregate = this;&nbsp;&nbsp; 	nowOwnedBy = NULL;&nbsp;&nbsp; 	dead = false;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;//true if it caused a merge&nbsp;&nbsp;&nbsp;&nbsp;bool testAndAdd(Wedge* w)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (dead) //was I redirected?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //then I don't do anything&nbsp;&nbsp; 	if (!vectors.update(VectorPair(w-&gt;wall.end(), w-&gt;wall.begin())))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false; //disjoint&nbsp;&nbsp; 	AggregateWedge* previousAggregate = w-&gt;myAggregate;&nbsp;&nbsp; 	w-&gt;myAggregate = this; //now I belong to this&nbsp;&nbsp; 	if (previousAggregate != NULL) //then it's a merge&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;vectors.update(previousAggregate-&gt;vectors);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//That means we have to redirect that to this&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;assert(previousAggregate-&gt;nowOwnedBy == NULL);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;previousAggregate-&gt;nowOwnedBy = this;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;previousAggregate-&gt;dead = true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return false;&nbsp;&nbsp;&nbsp;&nbsp;} }; class AggregateWedgeSet { public:&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	at;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	firstValid;&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedge&nbsp;&nbsp;&nbsp;&nbsp;agList[8192];&nbsp;&nbsp;&nbsp;&nbsp;float&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	minDistanceSq;&nbsp;&nbsp;&nbsp;&nbsp;float&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	maxDistanceSq;&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedgeSet() : minDistanceSq(0), maxDistanceSq(FLT_MAX)&nbsp;&nbsp;{}&nbsp;&nbsp;&nbsp;&nbsp;void add(int numberWedges, Wedge* wedgeList)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	at = 0;&nbsp;&nbsp; 	for (int j = 0; j &lt; numberWedges; ++j)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Wedge* w = wedgeList + j;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;w-&gt;myAggregate = NULL; //none yet&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bool mergesHappened = false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for (int i = 0; i &lt; at; ++i)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	mergesHappened |= agList[i].testAndAdd(w);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (mergesHappened)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//some number of aggregates got merged into w-&gt;myAggregate&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//We need to do fixups on the wedges' pointers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	for (int k = 0; k &lt; j; ++k)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedge* in = wedgeList[k].myAggregate;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (in-&gt;nowOwnedBy) //do you need an update?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	in = in-&gt;nowOwnedBy;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	while (in-&gt;nowOwnedBy) //any more?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in = in-&gt;nowOwnedBy;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	wedgeList[k].myAggregate = in;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	for (int k = 0; k &lt; at; ++k)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;agList[k].nowOwnedBy = NULL;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (w-&gt;myAggregate == NULL) //time to start a new one&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	agList[at++].init(w);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp; 	} // all wedges in&nbsp;&nbsp; 	minDistanceSq = FLT_MAX;&nbsp;&nbsp; 	for (int j = 0; j &lt; numberWedges; ++j)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//get nearest approach&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float ds = wedgeList[j].nearestDistance;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (ds &lt; minDistanceSq)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	minDistanceSq = ds;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	minDistanceSq -= 0.25f; //fear roundoff - pull this is a little&nbsp;&nbsp; 	firstValid = 0;&nbsp;&nbsp; 	for (int i = 0; i &lt; at; ++i)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (!agList[i].dead)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	firstValid = i; #if 0 // Not sure this is working? Maybe relates to using L to change bounds?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//if this is the only valid wedge and it is all-encompassing, then we can&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//walk all the wedges and find the furthest away point (which will be some&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//wall endpoint). Anything beyond that cannot be in bounds.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (agList[i].isAllEncompassing())&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxDistanceSq = 0;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for (int j = 0; j &lt; numberWedges; ++j)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	float ds = wedgeList[j].wall.begin().dotSelf();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (ds &gt; maxDistanceSq)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxDistanceSq = ds;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	ds = wedgeList[j].wall.end().dotSelf();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	if (ds &gt; maxDistanceSq)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxDistanceSq = ds;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	} #endif&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	break;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;const AggregateWedge* whichAggregateWedge(const Point p) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	for (int i = firstValid; i &lt; at; ++i)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (agList[i].dead)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	continue;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (agList[i].isIn(p))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return agList + i;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return NULL;&nbsp;&nbsp;&nbsp;&nbsp;} }; //#define UsingOuter //this slows us down. Do not use. #ifdef UsingOuter #define TheTest testOccludedOuter #else #define TheTest testOccluded #endif class AreaOfView { public:&nbsp;&nbsp;&nbsp;&nbsp;Point&nbsp;&nbsp;&nbsp;&nbsp;center;&nbsp;&nbsp;&nbsp;&nbsp;float&nbsp;&nbsp;&nbsp;&nbsp;radiusSquared;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numberWedges;&nbsp;&nbsp;&nbsp;&nbsp;BoundingRect&nbsp;&nbsp;bounds;&nbsp;&nbsp;&nbsp;&nbsp;Wedge&nbsp;&nbsp;&nbsp;&nbsp;wedges[8192];&nbsp;&nbsp;&nbsp;&nbsp;//VERY experimental&nbsp;&nbsp;&nbsp;&nbsp;AggregateWedgeSet ags;&nbsp;&nbsp;&nbsp;&nbsp;inline AreaOfView(const Point& center_, const float radius) :&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;center(center_),&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;radiusSquared(radius * radius),&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numberWedges(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bounds.set(center, radius);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;addWalls(); 	}&nbsp;&nbsp;&nbsp;&nbsp; 	void changeTo(const Point& center_, const float radius)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	center = center_;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;radiusSquared = radius * radius;&nbsp;&nbsp; 	bounds.set(center, radius);&nbsp;&nbsp; 	numberWedges = 0;&nbsp;&nbsp; 	addWalls();&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;void recompute() //rebuild the wedges, with existing center and radius&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	bounds.set(center, sqrtf(radiusSquared));&nbsp;&nbsp; 	numberWedges = 0;&nbsp;&nbsp; 	addWalls();&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline bool isIn(Point p) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	p -= center;&nbsp;&nbsp; 	const float distSq = p.dotSelf();&nbsp;&nbsp; 	if (distSq &gt;= radiusSquared)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false;&nbsp;&nbsp; 	for (int i = 0; i &lt; numberWedges; ++i)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].TheTest(p, distSq))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return true;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;/* On the theory that the wedge that rejected your last point has a higher than&nbsp;&nbsp;&nbsp;&nbsp;average chance of rejecting your next one, let the calling thread provide&nbsp;&nbsp;&nbsp;&nbsp;space to maintain the index of the last hit&nbsp;&nbsp;&nbsp;&nbsp;*/&nbsp;&nbsp;&nbsp;&nbsp;inline bool isInWithCheat(Point p, int* hack) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	p -= center;&nbsp;&nbsp; 	const float distSq = p.dotSelf();&nbsp;&nbsp; 	if (distSq &gt;= radiusSquared)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false;&nbsp;&nbsp; 	if (distSq &lt; ags.minDistanceSq)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //this range is always unencumbered by walls&nbsp;&nbsp; 	if (distSq &gt; ags.maxDistanceSq) //not working. Why?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false;&nbsp;&nbsp; 	if (numberWedges == 0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //no boundaries&nbsp;&nbsp; 	//try whatever worked last time, first. It will tend to win again&nbsp;&nbsp; 	if (wedges[*hack].TheTest(p, distSq))&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false;&nbsp;&nbsp; 	} #define UseAgg #define UseAggP #ifdef UseAgg&nbsp;&nbsp; 	const AggregateWedge* whichHasMe = ags.whichAggregateWedge(p);&nbsp;&nbsp; 	if (whichHasMe == NULL)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return true; //can't be occluded! #endif&nbsp;&nbsp; 	//try everything else&nbsp;&nbsp; 	for (int i = 0; i &lt; *hack; ++i)&nbsp;&nbsp; 	{ #ifdef UseAggP #ifdef UseAgg&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].myAggregate != whichHasMe)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	continue; #endif #endif&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].TheTest(p, distSq))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	*hack = i; //remember what worked for next time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp; 	}&nbsp;&nbsp; 	for (int i = *hack + 1; i &lt; numberWedges ; ++i)&nbsp;&nbsp; 	{ #ifdef UseAggP #ifdef UseAgg //does seem to help speed, but don't work yet&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].myAggregate != whichHasMe)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	continue; #endif #endif&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].TheTest(p, distSq))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	*hack = i; //remember what worked for next time&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return true;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline bool isInWithWallExclusion(Point p, const Wall* excludeWall) const&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	p -= center;&nbsp;&nbsp; 	const float distSq = p.dotSelf();&nbsp;&nbsp; 	if (distSq &gt;= radiusSquared)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return false;&nbsp;&nbsp; 	for (int i = 0; i &lt; numberWedges; ++i)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].source == excludeWall)//this one doesn't count&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	continue;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (wedges[i].TheTest(p, distSq ))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	return false;&nbsp;&nbsp; 	}&nbsp;&nbsp; 	return true;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;void addWall(Wall* w, const float nearestDistance);&nbsp;&nbsp;&nbsp;&nbsp;void addWalls(); }; class AreaRef { public:&nbsp;&nbsp;&nbsp;&nbsp;AreaOfView*&nbsp;&nbsp;&nbsp;&nbsp;a;&nbsp;&nbsp;&nbsp;&nbsp;AreaRef() {a = NULL;}&nbsp;&nbsp;&nbsp;&nbsp;void set(const Point& p, float radius)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (a == NULL)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a = new AreaOfView(p, radius);&nbsp;&nbsp; 	else&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a-&gt;changeTo(p, radius);&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;~AreaRef() {delete a;}&nbsp;&nbsp;&nbsp;&nbsp;void empty() {delete a; a = NULL;}&nbsp;&nbsp;&nbsp;&nbsp;AreaOfView* operator-&gt;() const {return a;} }; class WallSet { public:&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;&nbsp; length;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;&nbsp; at;&nbsp;&nbsp;&nbsp;&nbsp;WallAndDist*&nbsp;&nbsp; list;&nbsp;&nbsp;&nbsp;&nbsp;WallSet() 	{&nbsp;&nbsp; 	at = 0;&nbsp;&nbsp; 	length = 2038;&nbsp;&nbsp; 	list = (WallAndDist*)malloc(length * sizeof(*list));&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;~WallSet() {free(list);}&nbsp;&nbsp;&nbsp;&nbsp;void add(Wall* w, const float distSq)&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	if (at &gt;= length)&nbsp;&nbsp; 	{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;length *= 2;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list = (WallAndDist*)realloc(list, length * sizeof(*list));&nbsp;&nbsp; 	}&nbsp;&nbsp; 	list[at].wall = w;&nbsp;&nbsp; 	const LineSeg* s = w-&gt;getSeg();&nbsp;&nbsp; 	list[at].lenSq = s-&gt;p[0].distanceSq(s-&gt;p[1]);&nbsp;&nbsp; 	list[at++].distSq = distSq;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;inline void sortByCloseness()&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp; 	qsort(list, at, sizeof *list, cmpWallDist);&nbsp;&nbsp;&nbsp;&nbsp;} }; void AreaOfView::addWall(Wall* w, const float nearestDistance) {&nbsp;&nbsp;&nbsp;&nbsp;if (num]]></description>
		<pubDate>Tue, 10 Nov 2009 04:44:27 +0000</pubDate>
		<guid isPermaLink="false">e18b6f179b6a5a068a01655542f9b6de</guid>
	</item>
	<item>
		<title>Cellular Textures, the light speed approach</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/cellular-textures-the-light-speed-approach-r2668</link>
		<description><![CDATA[<br /><strong class='bbc'>Introduction</strong><br /> Hi, my name is Carsten Przyluczky (aka Kastor formerly known as Jazzoid), I am a student of computer science. I am interested in game development and a huge fan of the demo scene, every time I saw a good Demo in 64 K or even 4 K I asked myself: “How the heck do they do that?! How can they put that much content into such a small space?” So me and a friend came up with the idea of writing a complete game in 96 K. During my research, I found out that most of the content is “procedural content”. That means it is calculated at runtime. Yes, they do some Math-magic and the results are these good-looking textures, music or even 3D models.<br /><br /> Creating procedural textures is mostly done by taking some basic shapes or patterns, and manipulating them. Manipulations can be colour inversion, blending or adding to textures, and so on. Basic shapes are things like circles, rectangles, plasma noise, or cellular patterns / textures.<br /><br /> In this article I am going to explain how to generate these cellular textures. For those who have never seen a cellular texture they look like this:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/cell1.png' alt='Posted Image' class='bbc_img' /></span></span> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/cell2.png' alt='Posted Image' class='bbc_img' /></span></span> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/cell3.png' alt='Posted Image' class='bbc_img' /></span></span> They are very versatile, you might use them to create spider webs or cracked dirt textures. If you search the internet you will find some great tutorials (see link section). They mostly involve an easy but slow approach. I will explain this basic approach too, and improve it step by step. In the end we will have an algorithm that is much faster and has some nice extra features.<br /><br /> <br /><strong class='bbc'>The basic approach (turtle speed)</strong><br /> Well, the basic idea is easy. Let's assume you want to create a texture that is 256 x 256 pixels large. The first step of the algorithm is to generate a set of random points let's call this <sup class='bbc'><em class='bbc'>P</em></sup>. As you can see, the x and y coordinates must lie between 0 and texture size – 1. So, for each point <sup class='bbc'><em class='bbc'>q</em></sup> of the texture you have to find the point <em class='bbc'>p</em>∈<sup class='bbc'><em class='bbc'>P</em></sup> that has the smallest distance to <sup class='bbc'><em class='bbc'>q</em></sup>, and store it into an array. In the case of <em class='bbc'>q</em>∈<sup class='bbc'><em class='bbc'>P</em></sup> the distance is set to zero.<br /><br /> The distance we just found will determine the color of the pixel in our texture. Usually, the distance is calculated by using the Euclidean formula. For our example it would look like this:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/eq1.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br /> There are some more possibilities, but let's skip these for now. Once you have calculated all distances, you have to normalise the values in the array to the range of 0 to 255.<br /><br /> These values can be easily transformed into a grey scale value by setting red green and blue to the value of the normalized array.<br /><br /> That's pretty simple, right? Just calculate the distances and use them to colourize your texture.<br /><br /> As far as I know this is the most common way to create the nice-looking cellular textures. Well, the main part of our algorithm is to find the closest point of our random set. If we want to speed up the algorithm we need to speed up that part.<br /><br /> But first, it's time to have a look at the algorithm's running time.<br /><br /> Let <sup class='bbc'><em class='bbc'>m</em></sup> be the cardinality of <sup class='bbc'><em class='bbc'>P</em></sup>, our random point set, and let be <sup class='bbc'><em class='bbc'>n</em></sup> the count of points in our texture. It is clear that the common algorithm needs to calculate distances for each of our points. This leads to a running time of,<br /><br /> O(n * m)<br /><br /> This means, if you want your texture to look finer and you use more random points, the time it takes to calculate rises drastically.<br /><br /> <br /><strong class='bbc'>The BSP approach (cheetah speed)</strong><br /> The first optimization that came into my mind was to use a binary space tree, to store the random points. Once we've created our tree we can use it to speed up “the quest for the closest point” In a BSP, each inner node holds two points and each leave one. In addition, each inner node has two branches, so we need at least two points to build a tree. The left branch of an inner node holds points that are closer to the first Node. The right branch works the other way around. So, if you search the closest point of our random set, you start at the root, calculate which point is closer and follow the corresponding branch.<br /><br /> Ok, lots of information here, let's take an example:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/bsp1.PNG' alt='Posted Image' class='bbc_img' /></span></span> The red points are our random points. The first point added is A then B , C and so on. If we add the points in another order the tree will look different. You need to check if the tree is balanced, in the worst case you will deal with a regular list, which won't be faster.<br /><br /> This approach increases the speed but needs some preprocessing like rebalancing and so on. For me this is no option because it needs too much extra code. Nevertheless, the running time of this approach decreases to something like<br /><br /> O(n * log(m))<br /><br /> (average case).<br /><br /> <br /><strong class='bbc'>The grid approach (light speed)</strong><br /> OK, fasten your seatbelts - we are reaching light speed now!<br /><br /> First of all remember that the color of a texture point flows from the distance to the closest point of our random set <sup class='bbc'><em class='bbc'>P</em></sup>. So, that's our main task. If we want to speed up our algorithm, optimizing this task is a good start.<br /><br /> We can archive this using a grid. We put the grid over our texture. Let's start with a 8 x 8 grid:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/raster.png' alt='Posted Image' class='bbc_img' /></span></span> As you can see our 8x8 grid has 64 cells. Actually the grid is just a virtual construct to relax the problem. The main trick now is to pick a random point for each cell. So in our case we need 64 random points. The x and y values of our points must lie in the range of the corresponding cell boundaries, like this<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/rast_points.png' alt='Posted Image' class='bbc_img' /></span></span> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/points.png' alt='Posted Image' class='bbc_img' /></span></span> The blue points represent our random set <sup class='bbc'><em class='bbc'>P</em></sup>. Okay now look at this:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/9tobechecked_finer.png' alt='Posted Image' class='bbc_img' /></span></span> Let the red point be an arbitrary point q, and we want to know the closest point <em class='bbc'>p</em>∈<sup class='bbc'><em class='bbc'>P</em></sup>. All we need to do is calculate the distances to the point that shares the same cell as q, and the eight surrounding ones. In the image the corresponding cells are grey. That's it ! One of these eight points must be the closest. Any other point must have a greater distance.<br /><br /> After only nine distance calculations, we got our closest point <em class='bbc'>p</em>∈<sup class='bbc'><em class='bbc'>P</em></sup>! Now that the cardinality of <sup class='bbc'><em class='bbc'>P</em></sup> doesn't matter any more, the running time of this approach is,<br /><br /> O(n * 9)<br /><br /> no matter whether you take 16 or 1024 random points!<br /><br /> <br /><strong class='bbc'>Extra features</strong><br /> <br /><strong class='bbc'>Making it tillable</strong><br /> Often you want your textures to be tillable. To achieve this you need to consider the “wrapped” points too. Let's call the cell that contains our arbitrary point <sup class='bbc'><em class='bbc'>q</em></sup> Q-Cell. And now let's assume Q-Cell is the upper left cell. The Q-Cell has no left neighbor, so we need to wrap, and take the upper right Cell as the Q-Cell's left neighbor. Look at the following image, it should make things clear: <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/9tobechecked_wrap.png' alt='Posted Image' class='bbc_img' /></span></span> In my code I do something like this to perform the wrap (Pseudo code):<br /><br /> LeftNeighborIndex = (QCell.X+NumOfCellsPerRow-1) mod NumOfCellsPerRow I perform a kind of “shift” here, and the -1 should “give” me the left-hand neighbor. In our case the Q-Cell has the (horizontal) index 0 and its right-hand neighbor (far-right) the index 7. If we put this information into our formula the result is<br /><br /> (0+8-1) mod 8 = 7 → <em class='bbc'>correct!</em> And as another example we assume our Q-Cell is the upper right-hand, and we want to know what its right-hand neighbor is (add 1) our calculation looks like this<br /><br /> (7+8+1) mod 8 = 16 mo 8 = 0 → <em class='bbc'>correct!</em> <br /><strong class='bbc'>Create different shapes</strong><br /> If you look at the three images in the introduction, you will notice that the left-hand one looks different. I used another distance function here, namely the distance of the two closest points. As I mentioned before, you can use several distance functions. Play around with it, invert it, square it, multiply the two closest, there are no limits! The following two images are created with Manhattan-noise. It is realized by taking the two closest points and subtract their x and y values<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/eq2.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/shape1.png' alt='Posted Image' class='bbc_img' /></span></span> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/cellTxt/shape2.png' alt='Posted Image' class='bbc_img' /></span></span> <br /><strong class='bbc'>Flaws</strong><br /> Well like all good things, this approach has some flaws. There are some extremly rare cases when this algorithm produces not the same results as the naive one. Take a look at this picture:&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://members.gamedev.net/gaiiden/flaw.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br />&nbsp;&nbsp;The green line indicates the smallest distance. As you can see the grid algorithm will select another point. This can be fixed by widen the grid and check 5x5 instead of 3x3. I can tell you that I did a lot of tests with 3x3 grid and the results were equal, but I still want to point that weakness out.<br /><br /> <br /><strong class='bbc'>Closing Words</strong><br /> So, this is the end my friend. I hope you had fun reading this article, because it was a lot of fun writing it. Go to my blog to find out more about my work and my procedural content editor (see link section). If you have any questions feel free to write an email to <a href='mailto:jazzoid@gmx.de' class='bbc_url' title='External link' rel='nofollow external'>jazzoid@gmx.de</a>. Cheers,<br />Carsten<br /><br /> <br /><strong class='bbc'>Links</strong><br /> <a href='http://kastor.wordpress.com/' class='bbc_url' title='External link' rel='nofollow external'>http://kastor.wordpress.com/</a><br /><a href='http://graphicdesignertoolbox.com/screenshots' class='bbc_url' title='External link' rel='nofollow external'>http://graphicdesignertoolbox.com/screenshots</a> (look at worley noise)<br /><a href='http://www.blackpawn.com/texts/cellular/default.html' class='bbc_url' title='External link' rel='nofollow external'>http://www.blackpawn.com/texts/cellular/default.html</a><br /><a href='http://petewarden.com/notes/archives/2005/05/testing.html' class='bbc_url' title='External link' rel='nofollow external'>http://petewarden.com/notes/archives/2005/05/testing.html</a><br /><a href='http://www.worley.com/' class='bbc_url' title='External link' rel='nofollow external'>http://www.worley.com/</a>]]></description>
		<pubDate>Mon, 10 Aug 2009 16:34:36 +0000</pubDate>
		<guid isPermaLink="false">354680832fcea7e2b7057a5ac2c489f8</guid>
	</item>
	<item>
		<title>Image Space Lighting</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/image-space-lighting-r2644</link>
		<description><![CDATA[<br /><strong class='bbc'>Introduction</strong><br /> Current graphics hardware has built in functionality for rendering a constant number of light sources at a time. While for some scenes this constant is large enough, for many scenes it is not, and for those scenes this hardware-determined limit can be frustrating to work around. To increase the number of light sources, most developers end up having to use techniques that only approximate the lighting (like glow effects) or are slow (like looping through all the lights manually per pixel or per vertex). This article focuses on an efficient way to render hundreds of dim light sources accurately using an image space technique that takes advantage of the graphics processor’s fast rasterization. This technique is very useful for scenes that have a lot of lampposts, lanterns, towns at night, lights on models, light-emitting particle effects and other scenes that have a large number of ‘small’ light sources. Additionally, since this technique lies in image space, the cost of the algorithm does not depend on the complexity of the geometry in the scene, only the light sources themselves. The same light sources rendered with this technique have the same additional cost whether the scene has 10K or 100K polygons.<br /><br /> In order to run this algorithm efficiently, the graphics card must support multiple render targets. This means that this algorithm only works for graphics cards that support DirectX 9.0 or OpenGL 2.0 (you can use extensions for MRT in earlier versions). Before reading this article, you should understand how to render to multiple targets and how to program vertex and pixel shaders.<br /><br /> <br /><strong class='bbc'>Lighting</strong><br /> Before we jump into the algorithm itself, lets first briefly explain lighting, and in doing so develop intuition on how the algorithm can work efficiently. For this algorithm, we only consider lights that actually have a position and exist in world space (unlike directional lights). To render directional lights, the built-in light sources on the graphics card should suffice (as there usually are not too many directional lights). For simplicity, we will assume all light sources are point light sources, although there are simple extensions to the algorithm to render non-point light sources which I will briefly discuss later.<br /><br /> Let’s say there are three point lights in a scene.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig1.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 1: Three light sources in a simple scene. The black suns represent the actual light sources, and the white circles around them represent their area of effect.</strong><br /><br />&nbsp;&nbsp;These light sources cast light in all directions around them, however the power of the light source attenuates quadratically with distance. This means that the farther away from the light source a point is, the less light it receives. At some distance, the amount of light received from the light source is small enough (less than some delta) that its contribution can be considered zero. This distance defines a sphere around the light source that represents the light source’s area of effect. Everything within this sphere receives light from the light source, and everything outside this sphere receives almost no light from the light source.<br /><br /> When rendering light sources using the hardware’s built in light sources or manually looping through a list of light sources, this area of effect is ignored, and all pixels will calculate the contribution of every light source, no matter how far away it is. If the light source is large and its area of effect covers the entire screen, then no computation is wasted this way. But when rendering thousands of dim light sources (which could have an area of effect of only a few pixels), this approach ends up wasting computational power for all of the pixels where the light’s contribution is negligible (outside of the light’s area of effect).<br /><br /> If you have 1,000 point lights, each of which affects only 100 pixels on the screen (where a screen might have 1,000,000 pixels), then without considering the area of effect, you are computing lighting for 1,000 lights over 1,000,000 pixels, a total of 1,000,000,000 lighting calculations. However, if for every light source, you only compute the lighting for the pixels within that light’s area of effect, then you only compute lighting for 100 pixels for each light source, for a total of 100,000 lighting calculations. This is a 10,000x speedup. Obviously, taking the light source’s area of effect into account is a big win when rendering lots of small light sources – and that is the essential idea behind image space lighting, and how it can operate so efficiently.<br /><br /> <br /><strong class='bbc'>The Area of Effect</strong><br /> Calculating the area of effect of a point light is a simple computation. The contribution of a light source on graphics hardware can be defined by the following:&nbsp;&nbsp;Contribution = Light Power * Cosine Term / (d ^2 * QUADRATIC_ATTENUATION + d * LINEAR_ATTENUATION + CONSTANT_ATTENUATION)<br /><br />&nbsp;&nbsp;Where the light power is the power (or brightness) of the light source (a constant for each light source), The cosine term is unknown, as it depends on the normal of the point being lit, but it is strictly between zero and one, so we conservatively set it to one. The three attenuation parameters define how the light’s power attenuates over distance (where d is the distance between the light source and the point). These are constants for each light source. We can then rearrange this equation to get the following:<br /><br />&nbsp;&nbsp;d ^2 * QUADRATIC_ATTENUATION + d * LINEAR_ATTENUATION + CONSTANT_ATTENUATION = Light Power / Contribution<br /><br />&nbsp;&nbsp;The equation is now quadratic with respect to d. To solve this equation, we set contribution variable to the smallest value we want to include in our area of effect, and solve for d. The positive solution of this equation is now the distance at which the contribution is equal to our small delta value, and beyond that distance, the contribution decreases (attenuates), so this distance defines the area of effect of the light source. We can compute this distance once for each light source (and update it if the light source changes). Using this distance, we can quickly determine if any point is within the area of effect of the light source or not (by comparing this distance with the distance from the point to the light source).<br /><br /> <br /><strong class='bbc'>Proxy Shapes</strong><br /> We can now loop through all light sources per pixel, and test if that pixel is within the light’s area of effect before computing the lighting. However, this will not greatly increase the speed – as branching on the GPU is slow, and a loop with an if statement for 1000 light sources for every pixel is still very expensive. Intuitively, we want to quickly figure out which pixels in image space are within our light source’s area of effect, and then add the contribution of the light source only for those pixels. Luckily – there is a very fast and optimized algorithm for projecting shapes into screen space on the GPU: Rasterization. GPU rasterization can rasterize over a million triangles in real time. If we bound the light’s area of effect within a low poly object – say a cube (12 triangles), we can then rasterize a cube for each light source to the screen, and the pixels in that cube represent approximately the pixels within the light source’s area of effect (if it is a bounding shape, it will never under-approximate).<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig2.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 2: Bounding cubes rasterized for each point light, quickly generating pixels within the light’s area of effect. Proxy cubes shown as transparent boxes, transparent circles depict the actual area of effect projected on the ground.</strong><br /><br />&nbsp;&nbsp;If the GPU can render a million triangles and each light source’s proxy shape is a cube (12 triangles), then we can potentially render over 80,000 light sources! Practically, we cannot achieve anywhere near 80,000 light sources, but this shows how using rasterization can easily and efficiently generate the pixels within each light source’s area of effect.<br /><br /> <br /><strong class='bbc'>Rendering and Implementation</strong><br /> Now that we know the theory behind image space lighting, it is time to work out the actual rendering and implementation details. The steps to rendering image space lighting are as follows: <ul class='bbcol decimal'><li>Extract the position, normal and color information of the scene to calculate the lighting at any pixel. This can be done with deferred shading, where the scene is rendered once with a pixel shader that stores the position, normal and color information into the proper render targets. Any effects like normal mapping can be done here – just outputting the final normal to the normal texture. </li><li>Render the scene to the screen normally (as it was rendered off screen previously). However, this does not require rendering the scene geometry again, as the per pixel attribute results were already generated in the deferred shading pass. Simply render a quad that fills the entire screen (this is easy to do with a vertex shader), and then retrieve the per pixel position, normal and color information from the deferred shading textures. During this pass, any primary light sources (like directional lights) and effects can be rendered (the contribution to the scene from the point lights will be added later). It is worth noting that the pixel shader only executes once per pixel in this pass, which can improve the speed of the rendering for complex pixel shaders. </li><li>Render the point lights on top of the previously rendered scene as follows: </li><li>  </li><li>Bind a shader that computes the lighting contribution from the light source whose proxy shape is being rendered on the point at the rasterized pixel position in the deferred shading textures. This shader can also discard the fragment if the point is not within the light’s area of effect to save computation costs (which happens because the proxy shape over-bounds the area of effect and because the depth test is disabled when rendering the proxy shapes (see below), so the point could be far away in the depth coordinate). </li><li>Set the blending function to 1 * src + 1 * dst. This indicates that the computed lighting at each pixel is added to the previous value. As the lighting equation is additive (summation of the contribution of all light sources), this works as expected. </li><li>Enable face culling, and cull front faces. If the back faces are culled, nothing will be rendered when the camera moves inside the proxy shape (as the front faces will be behind the camera). As long as the proxy shape is convex, there are only two rasterized fragments for each pixel on the shape (one on the front face, and one on the back face). If the front faces are culled, only one fragment is rasterized per pixel, which means that no light will contribute lighting twice for any pixel (which would be incorrect). </li><li>Disable the depth test as pixels on the back face of the proxy shape could fail the depth test (thus contributing no light) even when the light source should contribute light to that pixel. However, since the pixel shader can quickly discard pixels outside of the light source’s area of effect, this does not affect performance too much. </li><li>Render all of the proxy shapes. The proxy shapes can be stored in a tree structure to quickly render only the proxy shapes within the view frustum to improve performance. </li><li>Finally, unbind the shader and textures, and then set blending, face culling and depth test back to their previous values.</li></ul> <strong class='bbc'>Code Snippets</strong><br /> Below is an OpenGL code snippet from the game project Aero Empire [2], which renders the point lights following the above implementation.&nbsp;&nbsp;Shader* cur_shader; void renderLight(Light* light){&nbsp;&nbsp; //pass light world position to pixel shader&nbsp;&nbsp; cur_shader-&gt;setUniformVec3("lightWorldPos", light-&gt;getWorldPosition());&nbsp;&nbsp; 	//pass light color (magnitude of vector is power) to pixel shader&nbsp;&nbsp; cur_shader-&gt;setUniformVec3("lightColor", light-&gt;getColor());&nbsp;&nbsp; 	//pass the light's area of effect radius to pixel shader&nbsp;&nbsp; cur_shader-&gt;setUniformFloat("lightAoE", light-&gt;getAoERadius());&nbsp;&nbsp;&nbsp;&nbsp;light-&gt;render(); //render the proxy shape } void renderLights(const Shape* scene){&nbsp;&nbsp; //set blend function&nbsp;&nbsp; glBlendFunc(GL_ONE, GL_ONE);&nbsp;&nbsp; 	//cull front faces&nbsp;&nbsp; glEnable(GL_CULL_FACE);&nbsp;&nbsp;&nbsp;&nbsp;glCullFace(GL_FRONT);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//disable depth testing&nbsp;&nbsp; glDisable(GL_DEPTH_TEST);&nbsp;&nbsp;&nbsp;&nbsp;glDepthMask(false);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//bind point light pixel shader&nbsp;&nbsp; Shader* shader = shaders[SHADER_LIGHT];&nbsp;&nbsp;&nbsp;&nbsp;shader-&gt;bind();&nbsp;&nbsp; //bind position, normal and color textures from deferred shading pass&nbsp;&nbsp; bindTexture(positionMap, POSITION_UNIT);&nbsp;&nbsp;&nbsp;&nbsp;shader-&gt;setUniformInt("positionMap", POSITION_UNIT);&nbsp;&nbsp; bindTexture(normalMap, NORMAL_UNIT);&nbsp;&nbsp; shader-&gt;setUniformInt("normalMap", NORMAL_UNIT);&nbsp;&nbsp; bindTexture(colorMap, COLOR_UNIT);&nbsp;&nbsp; shader-&gt;setUniformInt("colorMap", COLOR_UNIT);&nbsp;&nbsp; bindTexture(attrMap, ATTR_UNIT);&nbsp;&nbsp; shader-&gt;setUniformInt("attrMap", ATTR_UNIT);&nbsp;&nbsp; //set pixel shader attribute "camPosition" to the camera pos&nbsp;&nbsp;&nbsp;&nbsp;//(as it is needed for Phong shading)&nbsp;&nbsp; camera.loadCameraPosition(shader, "camPosition");&nbsp;&nbsp;&nbsp;&nbsp;shader-&gt;setUniformFloat("d_sx", 1.0/width);&nbsp;&nbsp; shader-&gt;setUniformFloat("d_sy", 1.0/height);&nbsp;&nbsp; cur_shader = shader;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//run renderLight function on all light proxy shapes in scene&nbsp;&nbsp; getLights(scene, &renderLight);&nbsp;&nbsp; 	//unbind and reset everything to desired values&nbsp;&nbsp; shader-&gt;unbind();&nbsp;&nbsp;&nbsp;&nbsp;unbindTexture(POSITION_UNIT);&nbsp;&nbsp; unbindTexture(NORMAL_UNIT);&nbsp;&nbsp; unbindTexture(COLOR_UNIT);&nbsp;&nbsp; unbindTexture(ATTR_UNIT);&nbsp;&nbsp; glDisable(GL_CULL_FACE);&nbsp;&nbsp; glEnable(GL_DEPTH_TEST);&nbsp;&nbsp; glDepthMask(true); }&nbsp;&nbsp;Below is the GLSL pixel shader bound above which computes the lighting (the vertex shader only sets gl_Position = ftransform() ).&nbsp;&nbsp;//input parameters uniform sampler2D positionMap, normalMap, colorMap, attrMap; uniform vec3 camPosition, lightWorldPos, lightColor; uniform float lightAoE, d_sx, d_sy; void main(void) {&nbsp;&nbsp; //calculate screen coord&nbsp;&nbsp;&nbsp;&nbsp; 	vec2 coord = vec2(gl_FragCoord.x*d_sx, gl_FragCoord.y*d_sy);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//get the position from deferred shading&nbsp;&nbsp;&nbsp;&nbsp; 	vec4 position = texture2D(positionMap, coord);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//vector between light and point&nbsp;&nbsp;&nbsp;&nbsp; 	vec3 VP = lightWorldPos-position.xyz;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//get the distance between the light and point&nbsp;&nbsp;&nbsp;&nbsp; 	float distance = length(VP);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//if outside of area of effect, discard pixel&nbsp;&nbsp;&nbsp;&nbsp; 	if(distance &gt; lightAoE) discard;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//normalize vector between light and point (divide by distance)&nbsp;&nbsp;&nbsp;&nbsp; 	VP /= distance;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//get the normal from deferred shading&nbsp;&nbsp;&nbsp;&nbsp; 	vec4 normal = texture2D(normalMap, coord);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//get the color from deferred shading&nbsp;&nbsp;&nbsp;&nbsp; 	vec4 color = texture2D(colorMap, coord);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//get lighting attributes from deferred shading&nbsp;&nbsp;&nbsp;&nbsp; 	vec4 attributes = texture2D(attrMap, coord);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;float diff_coefficient = attributes.r;&nbsp;&nbsp;&nbsp;&nbsp; 	float phong_coefficient = attributes.g;&nbsp;&nbsp;&nbsp;&nbsp; 	float two_sided = attributes.b;&nbsp;&nbsp;&nbsp;&nbsp; 	float cos_theda = dot(normal.xyz, VP);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//calculate two sided lighting.&nbsp;&nbsp;&nbsp;&nbsp; 	cos_theda = (cos_theda &lt; 0.0)?-two_sided*cos_theda:cos_theda;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//calculate diffuse shading&nbsp;&nbsp;&nbsp;&nbsp; 	float diffuse = diff_coefficient*cos_theda;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//calculate half vector&nbsp;&nbsp;&nbsp;&nbsp; 	vec3 H = normalize(VP+normalize(camPosition - position.xyz));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//calculate Phong shading&nbsp;&nbsp;&nbsp;&nbsp; 	float phong = phong_coefficient*pow(max(dot(H, normal.xyz), 0.0), 100.0);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//calculate light contribution with attenuation&nbsp;&nbsp;&nbsp;&nbsp; 	vec3 C = lightColor*(color.rgb*diffuse+phong)/(distance*distance+0.8);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	//all lights have constant quadratic attenuation of 1.0, with a constant attenuation of 0.8 to avoid dividing by small numbers&nbsp;&nbsp;&nbsp;&nbsp; 	gl_FragColor = vec4(C, 1.0); //output color }&nbsp;&nbsp;If you are planning to add lanterns or lamps, adding two-sided lighting allows for point lights inside a lampshade or lantern to contribute light to it. The above shader includes a very simple implementation of two-sided lighting. <br /><strong class='bbc'>Results</strong><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig3.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 3: 2175 bright cube lanterns rendered on a flat plane in a night scene.</strong> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig4.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 4: A daytime rendering of a blimp from Aero Empire [<a href='http://www.gamedev.net/reference/programming/features/imgSpaceLight/page3.asp#ref' class='bbc_url' title=''>2</a>] with lanterns during the day. Notice the Phong lighting on the envelope.</strong><br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig5.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 5: A nighttime rendering of the same blimp as above.</strong><br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig6.png' alt='Posted Image' class='bbc_img' /></span></span><br /><strong class='bbc'>Figure 6: Ship hanger interior from Infinity [<a href='http://www.gamedev.net/reference/programming/features/imgSpaceLight/page3.asp#ref' class='bbc_url' title=''>3</a>]. This rendering includes both point lights and ambient lights (adding point lights that approximate indirect lighting).</strong><br /><br />&nbsp;&nbsp;As you can see, image space lighting creates accurate lighting effects that add to the scene whether daytime or nighttime, interior or exterior. The ability to efficiently render a large number of point light sources allows for much more diverse lighting environments which greatly improves the rendering of the scene, yet still runs in real time.<br /><br /> <br /><strong class='bbc'>Performance</strong><br /> The cost of this algorithm depends solely on the number of point lights rendered and the number of pixels within each light’s area of effect. The number of lights determines the overhead of the rasterization step. Without this overhead, in the worst case where the area of effect of all lights is the entire screen, the algorithm would simply add the contribution of every light for every pixel (which would have the same cost as rendering all of those lights by looping over all lights per pixel). However, there is also the overhead of the rasterization step, which is small (as modern rasterizers can handle a million triangles), but is worth noting. The number of pixels rasterized from the proxy shapes determines the main performance hit for this algorithm. Point lights that are very bright, or are very close to the camera end up generating a large number of pixels. While one or two point lights filling the entire screen does not impact performance too much, having many of such point lights is the main bottleneck of image space lighting. This algorithm performs better when the point lights are not clustered, for if the camera is close enough to one point light for it to fill the screen, then the other point lights are farther away and so have a smaller area of effect in screen space (as long as the points are not clustered).<br /><br /> To increase performance out of this algorithm – reduce point light usage, decrease point light brightness, and spread point lights out more (you can always approximate a cluster of point lights with a single brighter point light).<br /><br /> Below are some performance results of this algorithm. All results are from rendering a lantern scene (like Figure 3) at a resolution of 600x400 on a GeForce 9200M GS. The ground is a 2000x2000 unit plane, and lanterns (12 triangles with a point light inside) are scattered in 50x50 unit clusters that are uniformly distributed on the ground plane. The camera is guaranteed to be close to one of those clusters (worst case). The light shader computes two-sided lighting for diffuse surfaces, and the deferred shading pass renders to four 32-bit floating-point textures.<br /><br /> Clusters is the number of clusters generated, lights is the number of lights per cluster generated, ISL # is the fps of the image space lighting algorithm where all lights have an area of effect radius # units (where the area of effect is determined by the brightness of the light and the cutoff threshold), and PPS is the fps of a naive direct loop of all lights per pixel (per pixel sum).<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/imgSpaceLight/fig7.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br />&nbsp;&nbsp;This shows that if the lights are not very clustered or bright, the image space lighting technique can render hundreds of point lights without too much of a performance hit. In the worst case, where the camera is near a single cluster with many point lights, the image space lighting technique still outperforms the per pixel sum algorithm when the lights are dim, however as the lights get brighter, the image space lighting algorithm slowly converges towards the speed of the per pixel sum algorithm (as it has to compute the lighting for each light source for each pixel).<br /><br /> <br /><strong class='bbc'>Downsides and Possible Solutions</strong><br /> There are three major downsides to Deferred Rendering (the first step to Image Space Lighting). All of these downsides come from the fact that you have to store all necessary geometric attributes into textures. The first downside is just the sheer amount of memory storage required (which can be quite limited, depending on the graphics card). In the demo code, I store the position, normal, diffuse color, diffuse component, specular component, and two-sided lighting component. This ends up being 9 32 bit floating point values, 1 32 bit depth value, and 4 8 bit unsigned byte values, for a total of 44 bytes per pixel. For a standard screen size of 1024x768, that is 786,432 pixels and a total of 33 megabytes. When a graphics card only has 128 megabytes dedicated memory, that’s about a fourth of the memory space used, which can be a problem when you also need to store vertex buffer objects (triangle mesh data stored on the GPU) and textures for the various objects in the scene. The storage of geometric data into textures is called a G-Buffer (standing for geometric buffer), and there are several tricks to bring its size down, like using 16 bit floating point values, spherical coordinates, storing the position as just a depth value, etc. In general, you want to compress and pack the data in a way that saves a lot of memory, however, this can also cause banding and other artifacts if you aren’t careful.<br /><br /> The second downside is anti-aliasing. Since you only store the geometric attributes per pixel, you lose the hardware anti-aliasing that allows you to blend edges by computing sub pixels. This causes aliased jagged edges that you have probably seen if you have rendered a scene without anti-aliasing. There are several ways to add anti-aliasing to a deferred rendering approach, like rendering to a larger buffer and then downsampling (however, this requires even more memory and the computational cost of averaging), and doing edge detection and blurring around the edges (which adds computational cost and does not take into account the sub pixels, so it’s not a very effective form of anti-aliasing). The ideal solution would be to generate and store sub pixels only where needed (like at the edges, which wouldn’t increase the memory cost too much), but this would require hardware support and extensions, as graphics card typically only render to a framebuffer (2D texture or buffer).<br /><br /> The final downside is alpha blending. Related to anti-aliasing, sometimes you need to generate multiple samples for certain pixels, but by rendering to a texture, you lose this information (as you can only have one sample per pixel). With alpha blending, you want a semi-transparent object to blend on top of the object behind it, however, this requires the geometric data of the foreground and background object to be stored in the same pixel. Again, with hardware support and extensions for rendering to structures other than framebuffers, you could store all of these samples and easily render scenes with alpha blending. However, without this hardware support, the best way to add alpha blending to deferred rendering scenes is to render the opaque surfaces first, and then render the alpha-blended surfaces on top (however, if they are rendered on top, then they do not get any light contribution from the point lights).<br /><br /> <br /><strong class='bbc'>Other Light Types and Effects</strong><br /> Adding spotlights and non-point lights simply requires calculating that light source’s area of effect, and computing a bounding proxy shape for it. For non-point lights, this is a little more complex, as the area of effect may not be spherical. Additionally, the pixel shader would have to handle rendering the different lighting types. Other than that, the algorithm should work the same – for each pixel, add the contribution of all light sources where the point lies within the light source’s area of effect. I have not experimented with non-point lights, but if anyone tries, feel free to share the results. Adding area light sources is also possible by approximating the light with many small point lights (a common approximation for rendering area lights). This allows for light sources that have different shapes (long, square, etc).<br /><br /> Additionally, instead of rendering light sources, you can use this technique to render light patches (for indirect illumination). The C. Dachsbacher and M. Stamminger’s paper, “Splatting Indirect Illumination” [<a href='http://www.gamedev.net/reference/programming/features/imgSpaceLight/page3.asp#ref' class='bbc_url' title=''>1</a>] uses rasterization to compute surface patches that receive direct lighting, and then computes the lighting received from those patches for all pixels within that patch’s area of effect (as done in image space lighting). This quickly renders an approximation to one bounce indirect illumination.<br /><br /> Currently, there are difficulties in casting shadows from these light sources efficiently. Sampling occlusion from a large number of light sources is a tough and expensive problem – and is usually solved by generating shadow maps for clusters of lights. However, since the lights rendered by this algorithm are dim, the fact that they do not cast shadows is not incredibly noticeable.<br /><br /> <br /><strong class='bbc'>Conclusion</strong><br /> Any game that wants to simulate complex lighting effects from a large number of light sources could benefit from this technique. Very few games use more than the hardware’s built-in light sources, and most of the smaller light sources are approximated with glow or light maps. This technique is offered as an alternative to glow and light maps that is still efficient, yet computes accurate lighting effects. This technique would be especially impressive for games with night scenes that are lit up by many small point lights. As a real world example of image space lighting, it is implemented in the game Aero Empire (which I am currently developing). The reason for adding it is mainly for rendering lanterns, which will be common within towns and on the blimps. They will allow for operation of blimps at night, towns lit up by lanterns at night, and lights for the interiors of blimps and buildings (which would be dark otherwise due to shadows). This is one example of how image space lighting can be used in a game. It is my hope that other games will use this technique to add more complex and interesting lighting environments to their graphics programs as well.<br /><br /> Infinity: The Quest for Earth is another real world example of a game that has included image space lighting. F. Brebion, developer of Infinity, implemented a technique similar to image space lighting to improve the interior of hangers and hulls of the ship – which would otherwise be dark when only including light from the sun. The inclusion of indirect lighting also greatly improved the realism of the rendered scenes. See Brebion’s journal on deferred lighting for more information on this [<a href='http://www.gamedev.net/reference/programming/features/imgSpaceLight/page3.asp#ref' class='bbc_url' title=''>3</a>]. Infinity is another great example of how image space lighting can be used to efficiently add more complex lighting environments.<br /><br /> If you have any questions, improvements or results from using <a href='http://downloads.gamedev.net/features/programming/imgSpaceLight/imgSpaceLightDemo.zip' class='bbc_url' title='External link' rel='nofollow external'>this algorithm</a> that you would like to share, feel free to post in the comments or contact me at <a href='mailto:terra0nova@hotmail.com.' class='bbc_url' title='External link' rel='nofollow external'>terra0nova@hotmail.com</a>. <br /><br /> <br /><strong class='bbc'>Further Reading / Sources</strong><br /> I have included the source code for rendering a lantern scene at night with the image space lighting technique. Feel free to fiddle with the code and put together more complex scenes as well. This code should be considered free to use and modify, so long as you give proper credit where due. [1] C. Dachsbacher , M. Stamminger. &lt;a href="<a href='http://www.vis.uni-stuttgart.de/http://images.gamedev.net/features/programming/imgSpaceLightdachsbcn/download/sii.pdf%22&gt;Splatting' class='bbc_url' title='External link' rel='nofollow external'>http://www.vis.uni-stuttgart.de/http://images.gamedev.net/features/programming/imgSpaceLightdachsbcn/download/sii.pdf"&gt;Splatting</a> Indirect Illumination. 2006.<br /><br /> [2] Collaborative Game Project. <a href='http://aeroempire.co.cc/' class='bbc_url' title='External link' rel='nofollow external'>Aero Empire</a>. 2009.<br /><br /> [3] F. Brebion. <a href='http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=263350&cmonth=4&cyear=2009&cday=3' class='bbc_url' title=''>Deferred Lighting and Instant Radiosity</a>. 2009.<br /><br />]]></description>
		<pubDate>Thu, 11 Jun 2009 14:23:51 +0000</pubDate>
		<guid isPermaLink="false">2ac4692cb4f636b0769d2c291af6aa88</guid>
	</item>
	<item>
		<title>Rendering Water as a Post-process Effect</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/rendering-water-as-a-post-process-effect-r2642</link>
		<description><![CDATA[Water plays an important role in all terrain renderers used in modern games and visualizations to present outdoor areas. This is because it improves general image quality in an extent bigger than any other technique and makes it more photorealistic. Alas, even though it tends to look better and better with every game released it is still far from being realistic. Besides, deferred shading gains popularity every month. Many newer games and engines make use of it, eg. Starcraft 2 and Tabula Rasa. Though deferred shading is typically used to limit lighting-related operations from O(objects_number * lights_number) to O(objects_number) I will prove that it can be helpful in many more algorithms.<br /><br />In this article I will describe a technique of realistic and flexible water rendering by using bump mapping in the post-process stage just after deferred shading. The presented technique fits into the concept of deferred shading pretty nicely, i.e. it helps to avoid additional geometry rendering. A description on deferred shading can be found in [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>1</a>], [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>2</a>] and [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>3</a>]. It is possible to implement this technique using forward rendering but in this case it seems less natural and may require additional work.<br /><br />The presented algorithm eliminates the majority of the flaws of typical techniques used for water rendering such as hard edges or unrealistic colour extinction. I will talk shortly about common drawbacks in the section “Traditional approaches to water rendering”.<br /> <br /><h2>Theory behind water</h2><br />The theory behind water is very complex and not fully understood. It will suffice to say that there is no adequate formal model describing it so far. Survey conducted by Guillot [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>4</a>] proved that none of the 46 models he analyzed is valid when compared to reality. Existing models are also a way too complex and computationally expensive to be used for real time applications, especially games. A game should not spend most of its CPU or GPU processing time just to update and render realistic water.<br /><br />So a completely different solution has to be found.<br /><br />In my opinion the theory of water for real time applications can be divided into two categories:<br /><ul class='bbc'><li>Waves - their animation, propagation and interaction with the rest of the world </li><li>Optics.</li></ul>To simulate waves propagation, FFT (Fast Fourier Transform [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>5</a>]) is frequently used. In this article I do not focus on waves propagation and so I will skip the theory behind it. I strongly believe that what is more important for good looking water is optics and I will describe it in a very detailed way. <br /><br /><h2>Reflection and refraction</h2><br />A light ray going through the water surface gets reflected and refracted, causing specular, caustics and light shafts to appear. Light shafts and caustics play an important role in underwater scenes, however for an observer standing above the water surface they do not improve the quality of water that much. A key to describing reflection and refraction is the Fresnel term. The Fresnel equation tells us how much light is reflected and refracted when it crosses a border between different media (in this case water and air). We have to make an assumption that light goes through two materials only as the term shown later does not apply in a more general case. Besides, both of them have to be homogeneous and one of them has to have a higher density. "Air – water” pair falls well into this assumption. However "air - window glass” does not as it consists of three materials (air – glass – air). Graphically it is depicted as follows:<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15657-0-39223900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15657" title="fig1.jpg - Size: 34.73K, Downloads: 4"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-85573100-1368304035_thumb.jpg" id='ipb-attach-img-15657-0-39223900-1369539296' style='width:480;height:274' class='attach' width="480" height="274" alt="Attached Image: fig1.jpg" /></a><br /><i>Light ray reflection and refraction</i><br /></p><br />For any given angle of light a and refraction coefficients describing both materials A and B, namely <em class='bbc'>n<sub class='bbc'>A</sub></em> and <em class='bbc'>n<sub class='bbc'>B</sub></em>, we use following notation:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15658-0-39240800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15658" title="eq1.jpg - Size: 14.71K, Downloads: 4"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-57707700-1368304106.jpg" id='ipb-attach-img-15658-0-39240800-1369539296' style='width:183;height:136' class='attach' width="183" height="136" alt="Attached Image: eq1.jpg" /></a><br /><br />The Fresnel term could then be defined as follows:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15659-0-39256600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15659" title="eq2.jpg - Size: 27.73K, Downloads: 4"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-39817600-1368304162.jpg" id='ipb-attach-img-15659-0-39256600-1369539296' style='width:388;height:100' class='attach' width="388" height="100" alt="Attached Image: eq2.jpg" /></a><br /><br />However this equation is too complex to be computed on GPU per-pixel. Therefore, a frequently used approximation is given below:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15660-0-39272300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15660" title="eq3.jpg - Size: 14.11K, Downloads: 3"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-57021800-1368304230.jpg" id='ipb-attach-img-15660-0-39272300-1369539296' style='width:333;height:27' class='attach' width="333" height="27" alt="Attached Image: eq3.jpg" /></a><br /><br />This function is very similar to the one described previously so the loss of quality is really insignificant. <em class='bbc'>R</em>(0) is constant and therefore it should be computed only once and then passed to the pixel shader.<br /><br />Index of refraction <a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15661-0-39290000-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15661" title="eq4.jpg - Size: 4.07K, Downloads: 4"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-17202500-1368304246.jpg" id='ipb-attach-img-15661-0-39290000-1369539296' style='width:80;height:40' class='attach' width="80" height="40" alt="Attached Image: eq4.jpg" /></a> for water is equal:<br /><br /><em class='bbc'>IOR</em> = 1,33333<br /><br /><h2>Colour extinction</h2><br />One of the most important aspects of proper water rendering are colour extinction as light goes deeper and deeper and also light scattering. However, rarely does anyone pay attention to that, although these are the two phenomena which cause water to have the colour we see. This colour is called the “apparent one” and can be totally different from so-called true water colour. To find this true colour, one has to filter out all the particles and organic material from water. It is done in laboratories and for us is not usable. One thing to note is that true water colour is bluish (not completely transparent as it was believed some time ago). We will use this information later on. For each component of the light spectrum, speed of extinction is different. This is due to different wavelengths of the components. The visible range of light is exceptional in that attenuation fades slowly compared to infra-red or ultraviolet. For them, absorption rate is so high they fade away entirely at a maximum depth of a few centimetres. In the table underneath, average depths at which light components die out are presented. The table is valid only for very clean waters. In the case of muddy ones like lakes or rivers it has to be modified:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15662-0-39305800-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15662" title="fig2.jpg - Size: 39.21K, Downloads: 4"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-88726300-1368304291.jpg" id='ipb-attach-img-15662-0-39305800-1369539296' style='width:387;height:215' class='attach' width="387" height="215" alt="Attached Image: fig2.jpg" /></a><br /><br />We make an assumption that extinction is linear with depth. In general this is not true as water is not homogeneous at its whole depth. Many waters consist of tiers of different temperatures and density which makes them different mediums. However, to simulate water in a convincing way it is not necessary to take this into account.<br /><br />Besides, extinction speed ratio and water colour are also influenced by chemical composition of the water area bottom, level of siltation, existence of organic materials (like algae or plankton) and even sky colour!<br /><br />An important conclusion from this deliberation is that the further from the equator the water is the more green its colour becomes.<br /><br /><h2>Traditional approaches to water rendering</h2><br />The traditional approach to the water rendering problem [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>6</a>] is based on rendering a plane with reflection and refraction textures applied to it in the level determined by the result of the Fresnel formula. To give water surface the impression of being wavy, the projection texture coordinates are modified (displaced) along a normal vector at a given water point. This normal vector is usually obtained from a normal map. This technique can be used only for lakes, however, as these waves are way too low to represent ocean ones. An alternative to this technique is the use of dense vertices grid (the technique known as the projected grid). Vertices in this technique are transformed in the vertex shader and this way it is possible to achieve realistic waves (not an imitation as in the previous example). In the pixel shader, reflection and refraction textures are applied as previously.<br /><br />A modification of this technique in turn is to use the vertex texture's fetch mechanism [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>7</a>], allowing sampling of the texture in the vertex shader. Thanks to that at this stage it is possible to use a height-map. This mechanism, introduced in the 3.0 shader model, gives much better results but still they are far from being realistic and are not yet used that often for water rendering.<br /><br />As you can see, the majority of existing and popular techniques focus primarily on modelling the appearance of the surface and less on the optics. I hope you remember me claiming that optics are very important.<br /><br />Although these approaches are the most popular techniques for rendering water up to now (despite the ever-increasing power and the capabilities of GPUs) there are several <strong class='bbc'>problems</strong> associated with them:<br /><ul class='bbc'><li>The transition between the shore and the water is very hard and thus unrealistic. The resulting water often resembles metallic substances like mercury. In nature, water is free from sharp edges – they are always very soft and smooth. Even liquids with a very high density (like oils) do not have them.</li><li>To obtain good quality waves it is required to have an adequately dense mesh, otherwise the waves will have sharp edges.</li><li>It is difficult to obtain the correct colour extinction with depth, as well as the effect of the transition from the shallow water to the depths. This is due to the fact that usually in the process of rendering water we have no information about the depth of water at a specific point of the scene. Thus, the water is simplified, and has incorrect colour. Some implementations make use of a so-called depth map for that – a static texture describing the depth of the water area. However, even the appearance of very large objects on the bottom (for instance a ship wreck) will not change water colour whereas it should.</li><li>The solutions are rather inflexible - a share of them can be used only for rendering oceans, some for lakes.</li><li>It is recommended that some form of LOD techniques will be used in order not to process too much information about distant parts of the water from the observer.</li><li>Rendering multiple different areas within a single scene is often difficult.</li></ul><br /><h2>Presented approach</h2><br />The presented approach is based on the simple fact that water is drawn in the phase of image post-processing and is not associated with any geometry. In addition, I make an assumption of the use of deferred shading since implementation in this case is simpler and seems to be more natural. Thanks to these assumptions at this stage we have the geometry buffer filled with data. In particular, we have easy access to information about the position of each vertex visible from the camera position point of the scene. Since the geometry of the buffer is nothing more than the geometry stored in the image space, modifying it to impact the actual geometry of the scene is possible. This simple fact allows us to completely change the approach to rendering displacements and bumps. It becomes possible to move many algorithms from the geometry stage to the post-processing stage. However, I will only show how to make water this way. At the same time, we can easily get rid of the disadvantages of the traditional techniques of rendering water, which were mentioned in the previous section. The presented approach does not, however, focus on the animation of waves propagation in the water. The decision to use static height-maps, as in the example, or dynamic ones modified in accordance to the FFT, is up to you.<br /><br /><h2>Modifying existing geometry</h2><br />Notice that if you make modifications to the texture storing the scene point positions it becomes possible to achieve a water surface with convincing waves. We can think of this position-storing texture as of depth of water at any given point. If you know the position of the water surface <strong class='bbc'>L</strong> and the position <strong class='bbc'>P</strong> of the scene pixel, then <strong class='bbc'>depth</strong> = <strong class='bbc'>L</strong> - <strong class='bbc'>P</strong>. <strong class='bbc'>Depth</strong> = 0 corresponds to the water surface and <strong class='bbc'>depths</strong> &lt; 0 correspond to the points located above the surface and so they can be skipped as they do not require further processing.<br /><br />In order to obtain waves, a traditional height-map in greyscale is used to alter this depth.<br /><br />The algorithm to create waves is relatively simple. It relies on tracing the ray from the position <strong class='bbc'>P</strong> of the scene to the observer and extruding waves in this direction. Several iterations are done. In each of them we sample our height-map and bias the water surface level by this value.<br /><br /><ol class='bbc'><li>For each scene point <strong>P</strong>:<ol class='bbc'><li>Current level of water surface L = level of water surface.</li><li>If <strong>P.y</strong> &gt; <strong>L</strong> + <strong>H</strong> (maximum wave height), then end, because point <strong>P</strong> is above the water surface. </li><li>Calculate the eye vector <strong>E</strong> as a difference between the pixel P and the observer position: <strong>E</strong> = <strong>P</strong> – <strong>Observer Position</strong>. </li><li>Normalize <strong>E</strong>: <strong>E</strong> = Normalize (<strong>E</strong>)</li><li>For the current level of water surface <strong>L</strong>, find a point of intersection with the vector <strong>E</strong>. Mark it as the position of water point <strong>S</strong>. In other words, find the point of intersection of the plane <strong>W</strong> = (0, 1, 0,-<strong>L</strong>) with the vector <strong>E</strong>.</li><li>Perform <i>n</i> iterations<ol class='bbc'><li>Sample the height-map at the point defined by <strong>S</strong> and in the direction of the vector <strong>E</strong>. The result is bias <strong>B</strong>.</li><li>Multiply <strong>B</strong> by the maximum wave height <strong>H</strong>: <strong>B</strong> = <strong>B</strong> * <strong>H</strong></li><li>Set new <strong>L</strong> as: <strong>L</strong> = <strong>L</strong> + <strong>B</strong></li><li>Find new <strong>S</strong> on the basis of new <strong>L</strong> value</li></ol></li></ol></li><li>Calculate the amount of accumulated water along the ray as <strong>A</strong> = length (<strong>P</strong> - <strong>S</strong>)</li><li>Calculate the depth of water at this point as <strong>D</strong> = <strong>S.y</strong> - <strong>P.y</strong></li><br /></ol>Note that the results, especially the amount of accumulated water <strong class='bbc'>A</strong> and depth <strong class='bbc'>D</strong> will be used later to simulate several aspects of water optics. <br /><br /><h2>The computation of normal vectors</h2><br />To have realistic and convincing water it is essential to compute normal vectors. While in the case of a vertices grid, normal vectors are known already at the stage of processing geometry in the vertex shader, in the case of the presented technique, they must be calculated entirely in the pixel shader as there is no real geometry. Luckily, a simplified way of computing normal vectors known from terrain rendering applies here as well. In order to calculate normal vectors, the height-map has to be sampled in four adjacent points to the processed one, that is:<br /><br /><strong class='bbc'>N</strong> = {<strong class='bbc'>W</strong> - <strong class='bbc'>E</strong>, 2<strong class='bbc'>d</strong>, <strong class='bbc'>S</strong> – <strong class='bbc'>N</strong>}<br /><br /><strong class='bbc'>W</strong>, <strong class='bbc'>E</strong>, <strong class='bbc'>S</strong> and <strong class='bbc'>N</strong> are sampled directly from our height-map.<br /><br />Although these normal vectors could serve as the ones used in the lighting and shading calculations, much better results can be achieved using an additional normal map. This is due to the fact that normal vectors so far are way too smooth whereas to achieve good water quality it is extremely important that detail will be present. It can be obtained only by using normal vectors with much higher resolution.<br /><br />To do this we are going to use the traditional normal mapping technique. Since there is no geometry, we do not have information not only about the normal vectors (which are already set) but also about the tangent and binormal ones which are necessary for the validity of the calculations. There is an approximate method of their computation in the pixel shader, fully described by Schuler [<a href='#ref' class='bbc_url' title='External link' rel='nofollow external'>8</a>]. Thanks to that, the construction of the matrix necessary to carry out normal mapping is possible and pretty easy. The code is as follows:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
float3x3 compute_tangent_frame(float3 Normal, float3 View, float2 UV)
{
	float3 dp1 = ddx(View);
	float3 dp2 = ddy(View);
	float2 duv1 = ddx(UV);
	float2 duv2 = ddy(UV);
	
	float3x3 M = float3x3(dp1, dp2, cross(dp1, dp2));
	float2x3 inverseM = float2x3(cross(M[1], M[2]), cross(M[2], M[0]));
	float3 Tangent = mul(float2(duv1.x, duv2.x), inverseM);
	float3 Binormal = mul(float2(duv1.y, duv2.y), inverseM);
	
	return float3x3(normalize(Tangent), normalize(Binormal), Normal);
}
</pre><br />where: <br /><ul class='bbc'><li><strong class='bbc'>Normal</strong> - normal vector, in our case, the vector set a moment ago.</li><li><strong class='bbc'>Position</strong> - position in the world space,</li><li><strong class='bbc'>UV</strong> - texture coordinates.</li></ul>Having this matrix we can now sample our normal map texture in a traditional way. The overhead is only a dozen or so arithmetic instructions associated with matrix construction above but believe me the result is worth its cost:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
float3x3 tangentFrame = compute_tangent_frame(normal, eyeVecNorm, texCoord);
float3 normal = normalize(mul(2.0f * tex2D(normalMap, texCoord) - 1.0f, tangentFrame));
</pre><br />Just be aware that the normal vectors have to change over time. If not, the water will resemble a rigid material like plastic. Therefore, the normal map has to be sampled several times with texture coordinates varying over time. This way it is possible to achieve interfering waves of different sizes that really looks fantastic. <br /><br /><h2>Optics</h2><br />As already mentioned several times in this article, optics play a key role in the convincing appearance of water. In this section we are going to focus on its individual aspects. <br /><br /><h2>Reflection and refraction of light</h2><br />The proposed technique does not support any new way of rendering the reflections. Thus, it should be done the traditional way. This means that the whole scene must be rendered to the texture with an altered view matrix (or world one). Picture below presents the concept:<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15663-0-39321400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15663" title="fig3.jpg - Size: 29.84K, Downloads: 3"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-30305000-1368304782_thumb.jpg" id='ipb-attach-img-15663-0-39321400-1369539296' style='width:480;height:204' class='attach' width="480" height="204" alt="Attached Image: fig3.jpg" /></a><br /><i>The idea of rendering the scene to the reflection texture</i><br /></p><br />The location of the observer is reflected with respect to the water surface. The forward vector is flipped as well as the up vector has to be recomputed to match the others.<br /><br />Water surface is also set as the user clipping plane to avoid rendering geometry above it as it would cause the reflection texture to contain too much data and therefore being invalid. For DirectX 10 and newer, SV_ClipDistance semantic should be used instead. Performing this step at the pixel shader level is not the very best idea as it is just too late – the pixel shader would be run for every pixel and thus more operations would have to be performed than is really necessary. Even if a pixel will be discarded at the very beginning of the pixel shader.<br /><br />In contrast, in the case of refraction we can make some simplifications, which for most users will remain negligible. Many of the post-process effects use information about the current state of the frame buffer as they modify or operate on it. In the case of water which is treated in the paper as such we can also benefit from this. This way you will not be rendering the scene to the next texture, but even so the final result will remain satisfactory. This solution also better fits the idea of deferred shading i.e. not rendering the same geometry many times. So what has to be done is to put frame buffer content on the screen and modify it slightly to create the impression of movement with the waves. In my implementation I have just changed screen space quad texture coordinates using time and the sine function.<br /><br /><h2>Specular</h2><br />Another important factor affecting the quality of the water effect is specular highlighting, in some implementations also called glare or sun glow. Water is characterized by high shininess. In this article we take into account only specular caused by global light – sunlight. Local lights influence water in lesser extent and therefore they can be skipped without sacrificing too much quality. The calculation of sun glare may be done in a number of ways. In my opinion, the best results can be achieved using this snippet I found some time ago:<br /><br /><pre class='prettyprint lang-auto linenums:0'>
half3 mirrorEye = (2.0 * dot(eyeVecNorm, normal) * normal - eyeVecNorm);
half dotSpec = saturate(dot(mirrorEye.xyz, -lightDir) * 0.5 + 0.5);
specular = (1.0 - fresnel) * saturate(-lightDir.y) * ((pow(dotSpec, 512.0)) * (shininess * 1.8 + 0.2));
specular += specular * 25 * saturate(shininess - 0.05);
</pre><br />The key here is the first line of code, which reflects the eye vector so that the incidence angle is equal to the emergent angle. Therefore, an angle between normal and normalized eye vector is found. In the next few lines there is only a slightly modified process of specular calculation. The constants’ values can be changed but after testing several ones I think these gives the best results. For the shininess parameter I suggest values in the range 0.5 to 0.7. <br /><br /><strong class='bbc'>Colour extinction</strong><br /><br />In many implementations of the water effect, light extinction is ignored. If it is implemented it is often simplified to multiplying water colour by some bluish shade. However, as I wrote in the section “Theory behind water” it is one of the most important factors affecting the apparent colour of the water. In the proposed solution light extinction is divided into two phenomena:<br /><ul class='bbc'><li>Colour extinction with a depth that makes all objects have bluish colour from a certain depth onwards</li><li>Colour extinction with increasing distance from the observer, the so-called horizontal transparency of water.</li></ul>Depending on the depth at the given point and the distance from the observer, water will have a different colour. We first define the vector of colour extinction, which is responsible for the rate of extinction of the r, g, b components of light. On the basis of the table from the theory section this vector can look this way:<br /><br /><em class='bbc'>extinction</em> = [4.5;75.0;300.0]<br /><br />Then the first attempt to compute proper water colour can be as follows:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15664-0-39337600-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15664" title="eq5.jpg - Size: 13.31K, Downloads: 3"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-02328800-1368304975.jpg" id='ipb-attach-img-15664-0-39337600-1369539296' style='width:340;height:51' class='attach' width="340" height="51" alt="Attached Image: eq5.jpg" /></a><br /><br />where:<br /><ul class='bbc'><li><strong class='bbc'>refraction</strong> – colour of refracted scene,</li><li><strong class='bbc'>D</strong> - the depth value determined at the end of “Modifying existing geometry” section</li></ul>Unfortunately this formula has a few flaws:<br /><ul class='bbc'><li>The colour fades away only with depth, however there is no decay along the eye vector. This makes water pixels far from the observer have the same colour as ones near it, whereas they should be much darker.</li><li>Moreover, control over water colour is limited and very difficult. While the clean waters are relatively easy to obtain, muddy waters like lakes are much harder. It is because in the above formula, water colour is not used explicitly.</li><li>The above calculation does not take into account the true colour of water, which is bluish. In this model it is white.</li></ul>We have to therefore modify our computations:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15665-0-39353400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15665" title="eq6.jpg - Size: 43.78K, Downloads: 5"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-75889000-1368305067_thumb.jpg" id='ipb-attach-img-15665-0-39353400-1369539296' style='width:480;height:104' class='attach' width="480" height="104" alt="Attached Image: eq6.jpg" /></a><br /><br />We introduced several new quantities to our formula:<br /><ul class='bbc'><li>The surface water colour (<strong class='bbc'>surfaceColor</strong>), which can be treated as a true colour of the water</li><li>The colour of deep water (<strong class='bbc'>depthColor</strong>)</li><li>Amount of accumulated water <strong class='bbc'>A</strong> as mentioned in the section “Modifying existing geometry”</li><li>Introduction of the horizontal visibility (<strong class='bbc'>visibility</strong>). The smaller the value of this parameter, the less transparent water will be.</li></ul>In the particular case for <strong class='bbc'>surfaceColor</strong> = 1 and <strong class='bbc'>depthColor</strong> = 0 final colour of the water will be very similar to that from the previous solution. What we get is the final refraction colour. Then using the value from Fresnel term we blend reflection with refraction and add specular to the result. This way, however, water will have hard shores and that was what we strove to avoid. Therefore, we once more blend the result with refraction to an extent determined by the input parameter specifying <strong class='bbc'>shore hardness</strong> (1 by default) multiplied by parameter A (water accumulation). By using the shore hardness parameter we will still be able to obtain the hard edges (sometimes they can still be useful, especially when rendering NPR - non-photorealistic rendering).<br /><br /><h2>Possible improvements</h2><br />One of these add-ons that you can get with a small amount of work and yet which significantly improves the realism of the water effect is foam. It arises when the water hits the shore and at the tops of more choppy waves. The first of them, i.e. coastal foam, can be obtained by making an assumption that foam begins at the edge (that is, at the depth equal to 0), and to a certain depth <em class='bbc'>H</em><sub class='bbc'>1</sub> remains constant, and in the range extincts completely.<br /><br />Whereas the latter can be achieved by using the following formula:<br /><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15666-0-39369300-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15666" title="eq7.jpg - Size: 13.24K, Downloads: 14"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-58268900-1368305165.jpg" id='ipb-attach-img-15666-0-39369300-1369539296' style='width:264;height:56' class='attach' width="264" height="56" alt="Attached Image: eq7.jpg" /></a><br /><br />where:<br /><ul class='bbc'><li><em class='bbc'>H</em> – current height,</li><li><em class='bbc'>H</em><sub class='bbc'>0</sub> - height at which foam appears</li><li><em class='bbc'>H</em><sub class='bbc'>max</sub> - height at which foam dies out.</li></ul>This gives you information how much foam is visible. However, to really get it on the screen you have to sample some foam texture and multiply its colour you get by the value found above. Using a photo of coastal foam works good for that. Another possible extension is to add the interaction between world objects and water. The basic form of interaction will be provided if you use foam as described above. Since it is closely related to the depth at the point of the water area, it will appear wherever the depth is small. As a result it will appear whenever an object falls into the water.<br /><br />However, the possibilities of interaction between water and the rest of the scene are literally unlimited. By modifying height-map either on CPU or in pixel shader you can easily introduce local disturbance for instance.<br /><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15667-0-39386900-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15667" title="fig4.jpg - Size: 213.79K, Downloads: 20"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-92192900-1368305245_thumb.jpg" id='ipb-attach-img-15667-0-39386900-1369539296' style='width:480;height:361' class='attach' width="480" height="361" alt="Attached Image: fig4.jpg" /></a><br /><i>Even though I consider foam as an “improvement” it greatly improves general image quality and makes it warmer. Therefore you should regard using it as often as possible.</i><br /></p><br />&nbsp;&nbsp;<br /><h2>Disadvantages of the presented technique</h2><br />Although the presented technique removes most of the defects of existing water rendering techniques listed in the section “Traditional approaches to water rendering” there is still enough room for improvements due to several drawbacks of the current algorithm:<br /><ul class='bbc'><li>It increases fill-rate as it is done in post-process and at the same time based on deferred shading. Fortunately, in most cases the water should not occupy more than half of the screen. On the other hand, often there is no need for LOD, since most of the screen pixels will be near the observer.</li><li>Water looks not that good at oblique angles. In order to get rid of this problem, it is possible to apply a different bump mapping technique. The alternative is to move the normal vector towards the observer for distant waves, which will improve their quality, as their back faces will be invisible.</li><li>In the presented approach local lights do not affect the colour of water. You can try to move water rendering before the lighting phase (we can then modify the normal data). This, however, will require modification to the existing rendering chain.</li></ul><br /><h2>Final results and summary</h2><p style='text-align:center'><br /><a class='resized_img' rel='lightbox[76fe3a501ed44a14b184b0442db05e94]' id='ipb-attach-url-15668-0-39403400-1369539296' href="http://www.gamedev.net/index.php?app=core&module=attach&section=attach&attach_rel_module=ccs&attach_id=15668" title="fig5.jpg - Size: 196.15K, Downloads: 21"><img src="http://uploads.gamedev.net/monthly_05_2013/ccs-8549-0-04486200-1368305345_thumb.jpg" id='ipb-attach-img-15668-0-39403400-1369539296' style='width:480;height:361' class='attach' width="480" height="361" alt="Attached Image: fig5.jpg" /></a><br /><i>You can achieve many different kinds of water using the presented technique, from tropical warm seas to cold lakes.</i><br /></p><br />The algorithm presented in this paper is a good alternative to the popular techniques used for rendering water effects. By using calculations based on the observation of the phenomenon, rather than on existing mathematical models, I have managed to achieve a realistic effect, which in the case of properly selected parameters’ values makes it possible to achieve results comparable with much more expensive models. It is also a flexible solution - the appearance of water can be controlled through a number of attributes. This makes it suitable for a wide range of effects no matter it be lakes or oceans. You can even have several areas of radically different waters at the very same time.<br /><br />Besides, as it does not rely on geometry it eliminates the most common flaws of the popular techniques discussed in more detail in the section “Traditional approaches to water rendering” of this article.<br /><br /><h2>A word on implementation</h2><br />The <a href='http://downloads.gamedev.net/features/programming/ppWaterRender/water_shader.zip' class='bbc_url' title='External link' rel='nofollow external'>provided shader</a> has been implemented using HLSL and DirectX 9. At present it requires a SM 3.0 capable GPU to run, however, it can be heavily optimized as my main concern was to make it as readable and simple as possible. Several textures are used in the code:<br /><ul class='bbc'><li>heightMap – height-map used for waves generation as described in the section “Modifying existing geometry”</li><li>backBufferMap – current contents of the back buffer</li><li>positionMap – texture storing scene position vectors</li><li>normalMap – texture storing normal vectors for normal mapping as described in the section “The computation of normal vectors”</li><li>foamMap – texture containing foam – in my case it is a photo of foam converted to greyscale</li><li>reflectionMap – texture containing reflections rendered as described in the section “Reflection and refraction of light”</li></ul>Note that positionMap is part of the G-Buffer as deferred shading is used in my implementation. In the case of forward rendering it should be depth map. positionMap in my case stores data in view space whereas in the accompanying shader all calculations are done in world space to simplify things a bit. Therefore position data has to be multiplied by the inverse view matrix. As water is rendered as a post-process effect, a full-screen quad has to be rendered on the screen with the water shader applied. <br /><br /><br /><h2>Further reading</h2><br />[1] Toman W., "Deferred shading as an effective lighting technique”, IGK’2008<br />[2] Placeres F. P., “Fast Per-Pixel Lightning with Many Lights”, “Game Programming Gems 6”, Charles River Media, 2006, Boston<br />[3] Hargreaves S., “Deferred Shading”, GDC, 2004, <a href='http://www.talula.demon.co.uk/DeferredShading.pdf' class='bbc_url' title='External link' rel='nofollow external'>available on-line</a><br />[4] Guillot B., “A reappraisal of what we have learnt during three decades of computer simulations on water”<br />[5] Jensen L. S., Golias R., "Deep-Water Animation and Rendering“, available on-line <a href='http://www.gamasutra.com/gdce/2001/jensen/jensen_01.htm' class='bbc_url' title='External link' rel='nofollow external'>http://www.gamasutra.com/gdce/2001/jensen/jensen_01.htm</a><br />[6]	Johanson C., „Real-time water rendering - projected grid concept”, available on-line <a href='http://graphics.cs.lth.se/theses/projects/projgrid/' class='bbc_url' title='External link' rel='nofollow external'>http://graphics.cs.lth.se/theses/projects/projgrid/</a><br />[7]	Kryachko Y. „Using Vertex Texture Displacement for Realistic Water Rendering”, GPU Gems 2<br />[8]	Schuler C., “Normal mapping without precomputed tangents”, ShaderX 5<br /><a name='ref'></a>]]></description>
		<pubDate>Thu, 04 Jun 2009 19:08:55 +0000</pubDate>
		<guid isPermaLink="false">845375903f6dbadda379558e905089f2</guid>
	</item>
	<item>
		<title>Moving Noise</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/moving-noise-r2640</link>
		<description><![CDATA[<strong class='bbc'>Introduction</strong><br /> In July 1985, Ken Perlin introduced his much-used algorithm for producing organic randomness, now known as Perlin noise [<a href='http://portal.acm.org/citation.cfm?id=325247' class='bbc_url' title='External link' rel='nofollow external'>1</a>]. That function consists of four important subroutines: <ul class='bbc'><li>Random gradient vector lookup. </li><li>S-curve evaluation. </li><li>Grid-point dot product evaluation. </li><li>Interpolation across all dimensions.</li></ul> One can find out the basics of this algorithm from the oft-visited tutorial website "Making Noise" [<a href='http://www.noisemachine.com/talk1/index.html' class='bbc_url' title='External link' rel='nofollow external'>2</a>]. Assuming that you are familiar with the Perlin noise function, you may have been excited to see the 2002 Ken Perlin paper "Improving Noise" [<a href='http://mrl.nyu.edu/%7Eperlin/paper445.pdf' class='bbc_url' title='External link' rel='nofollow external'>3</a>]. That brief report on "improved noise" gave a very elegant implementation of the classic noise algorithm useful to so many procedural methods. Other optimizations and extensions have been made to the Perlin noise algorithm for use in various settings. Among these is "simplex noise" [<a href='http://mrl.nyu.edu/%7Eperlin/homepage2006/simplex_noise/index.html' class='bbc_url' title='External link' rel='nofollow external'>4</a>][<a href='http://webstaff.itn.liu.se/%7Estegu/simplexnoise/simplexnoise.pdf' class='bbc_url' title='External link' rel='nofollow external'>5</a>], offered by Perlin in 2001 - a straightforward approach to quickly computing high-dimensional noise functions - and "modified noise" [<a href='http://www.cs.umbc.edu/%7Eolano/papers/mNoise.pdf' class='bbc_url' title='External link' rel='nofollow external'>6</a>], by Marc Olano - which produces certain desirable noise properties efficiently on GPU systems.&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/movNoise/noise0.png' alt='Posted Image' class='bbc_img' /></span></span><br />Graphics using the 2002 improved noise algorithm (D Nielsen)<br /><br />&nbsp;&nbsp;I developed the following "dynamic noise" algorithm in March 2004. In this context, the word "dynamic" refers to the inherently mutable nature of this noise function. In other words, it moves. While rereading my senior design project from the year before (a hardware-description-language implementation of improved noise with minor novelties), I noticed I had obliquely mentioned that the noise algorithm could likely be made dynamic. With a bit of tinkering, this algorithm emerged. I do not know whether something similar has arisen any place in the field, but the function has proven useful, so it seemed time to make certain it was passed on. (All other attempts to do so have seemed unusually thwarted by the universe.) Okay, enough about myself.<br /><br /> The general enhancement of this algorithm is noise movement without need of slicing through a higher dimension, so much less computation is required. The general drawbacks are slightly augmented memory usage as compared to traditional noise due to scaling of the gradient vectors, and use of an iterative timestepping approach that makes backtracking to a previous noise state or changing flow velocity more difficult than in the traditional approach of slicing through a higher dimension.<br /><br /> <br /><strong class='bbc'>Method</strong><br /> We should begin by describing a 2D implementation of improved noise. Perlin provides code for the 3D case accompanying his 2002 brief, but leaves other cases out. We can use our own sensibilities to create a 2D case. The main difference is the set of possible gradient direction vectors. We will place them within the set {[1,2], [1,-2], [-1,2], [-1,-2], [2,1], [2,-1], [-2,1], [-2,-1]}. These vectors do not align with either the grid lines or the grid diagonals, as Perlin suggests they should not.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/movNoise/noise2.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br />&nbsp;&nbsp;Generating dynamic noise is very similar to making improved noise, with one primary difference: all 3-bit gradient direction vectors are scaled by an unsigned value. This produces a slightly less dense noise function, but a more controllable one. For the 2D case, we will scale the gradient vectors with a 3-bit unsigned value. The 3-bit [0..7] range simply emerged from guesswork and testing. It seems to be the smallest bit-width that produces mostly decent results, at least with a constant change factor in scale. The change in gradient vector scale per time step is stored in a 1-bit value, representing +1 or -1 change. When changing, the gradient vectors just scale up and down, up and down: 0, 1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, ... The gradient vector scale change is made based on a coin toss: true then change, else leave it alone. I believe I had also tried other nonlinear methods like bit-shifting scale and changing the scale sinusoidally to get higher-order continuity over time, but they did not work out so well.<br /><br /> All this yields the following values in the gradient LUT for the 2D case:<br /><br /> <ul class='bbc'><li>3 bits for scale, indicating a value in range [0..7]. </li><li>3 bits for direction, indicating [1,2], [1,-2], [-1,2], [-1,-2], [2,1], [2,-1], [-2,1], or [-2,-1]. </li><li>1 bit for scale delta, indicating +1 or -1.</li></ul> For the 3D case, four bits are needed for direction, giving 4b (direction) + 3b (scale) + 1b (scale change) = 1B. The magic is that, when gradient vectors are scaled by zero, we can swap around the direction vectors without any noticeable effect. Then, when they are rescaled, the noise function takes on the new form determined by the gradient direction vectors. As gradient vectors change scale, new vectors are always reaching zero scale, and so gradient vector swapping occurs every time step. If you are familiar with the improved noise function, this may sound pretty simple. And it is!<br /><br /> This approach also implies the constraint that gradient direction vectors must be stored in writable memory, of course, not constants.<br /><br /> The 1D case is trivial, and the 3D case can be extrapolated from the Perlin improved noise code and an understanding of the dynamic noise approach. Higher dimensions are not presented.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/movNoise/noise1.png' alt='Posted Image' class='bbc_img' /></span></span><br />Graphics using dynamic noise (D Nielsen)<br /><br />&nbsp;&nbsp;<br /><strong class='bbc'>Economy</strong><br /> The dynamic noise implementation presented here requires the following measures over static noise:<br /><br /> <ul class='bbc'><li>The table holding permutation values must be independent of that holding gradient lookup values. </li><li>The dot products evaluated at grid points must be scaled. </li><li>Time changes iteratively, not as a function parameter. </li><li>Gradient values must be updated whenever a time step occurs.</li></ul> The true cost of these new requirements is small and is further diminished due to constraints and provisions already inherent in target systems. For example, a hardware implementation of the static noise function is well-suited to pipelining; therefore, it may be advantageous to place the permutation table in a processing stage independent of the table holding gradient vectors, in order to load-balance stages in the pipeline. Notice also that the gradient direction values require fewer bits per entry than do the permutation values, so that a separate gradient direction table does not require much additional memory.<br /><br /> As an approximation, we can think of the dynamic noise function as being half the computational complexity of a comparable improved noise function. A static improved noise function on a regular grid in N-space is usually computed as an interpolation of two noise functions in (N-1)-space (except with grid-point evaluations using vectors in N-space). By reducing the dimensional requirement by one dimension, less than half of the computations are required for a dynamic noise evaluation.<br /><br /> It is expected that dynamic noise benefits from GPU vector hardware operations, because such operations could be used to perform the grid-point dot products and scaling, and that matrix operations might be used to perform groups of vector operations for even better savings.<br /><br /> <br /><strong class='bbc'>Conclusion</strong><br /> This algorithm is quite simple and useful, and can produce convincing results for most noisy effects, although the particular implementation given here is likely not suitable for very highend visuals due to coarse timestepping and efficient integer math. (For a critique of Perlin noise in general, see [<a href='http://portal.acm.org/citation.cfm?id=1073204.1073264' class='bbc_url' title='External link' rel='nofollow external'>7</a>].) The believed enhancements that dynamic noise provides are reduced computational complexity by over half, better localized noise density conservation (so there is more of a sense of flow), and no question of slice orientation or ordered holes appearing from use of a higher dimension. This method also provides smoothly variable flow velocity when rotating a slice through a higher dimension. While the need for independence between the gradient table and the permutation table is required in this dynamic noise implementation (unlike the reusable table of static noise), a pipelined hardware implementation begs this independence anyway.<br /><br /> A set of dynamic noise C++ classes (1D, 2D, and 3D) can be found in source code below, along with a separate Windows demonstration binary. The executable displays layered dynamic noise in various places around a scene. These files are provided for your convenience. There is no guarantee about any behavior on any system - use and modification is allowed within your own liability. <a href='http://downloads.gamedev.net/features/programming/movNoise/noise.zip' class='bbc_url' title='External link' rel='nofollow external'>noise.zip</a><br /><br /> Thanks and happy coding!<br /><br /><br /> <br /><strong class='bbc'>References</strong><br /> [1] Perlin, K. An image synthesizer. 1985.<br /><a href='http://portal.acm.org/citation.cfm?id=325247' class='bbc_url' title='External link' rel='nofollow external'>http://portal.acm.org/citation.cfm?id=325247</a><br /><br /> [2] Perlin, K. Making noise. 2000.<br /><a href='http://www.noisemachine.com/talk1/index.html' class='bbc_url' title='External link' rel='nofollow external'>http://www.noisemachine.com/talk1/index.html</a><br /><br /> [3] Perlin, K. Improving noise. 2002.<br /><a href='http://mrl.nyu.edu/%7Eperlin/paper445.pdf' class='bbc_url' title='External link' rel='nofollow external'>http://mrl.nyu.edu/~perlin/paper445.pdf</a><br /><br /> [4] Perlin, K. A sheet of simplex noise. 2006.<br /><a href='http://mrl.nyu.edu/%7Eperlin/homepage2006/simplex_noise/index.html' class='bbc_url' title='External link' rel='nofollow external'>http://mrl.nyu.edu/~perlin/homepage2006/simplex_noise/index.html</a><br /><br /> [5] Gustavson, S. Simplex noise demystified. 2005.<br /><a href='http://webstaff.itn.liu.se/%7Estegu/simplexnoise/simplexnoise.pdf' class='bbc_url' title='External link' rel='nofollow external'>http://webstaff.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf</a><br /><br /> [6] Olano, M. Modified noise for evaluation on graphics hardware. 2005.<br /><a href='http://www.cs.umbc.edu/%7Eolano/papers/mNoise.pdf' class='bbc_url' title='External link' rel='nofollow external'>http://www.cs.umbc.edu/~olano/papers/mNoise.pdf</a><br /><br /> [7] Cook, RL, et al. Wavelet noise. 2005.<br /><a href='http://portal.acm.org/citation.cfm?id=1073204.1073264' class='bbc_url' title='External link' rel='nofollow external'>http://portal.acm.org/citation.cfm?id=1073204.1073264</a>]]></description>
		<pubDate>Wed, 27 May 2009 10:07:26 +0000</pubDate>
		<guid isPermaLink="false">b673ed011cfb3c810010abed6f3a034b</guid>
	</item>
	<item>
		<title>Dynamic 3D Scene Graphs</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/dynamic-3d-scene-graphs-r2590</link>
		<description><![CDATA[<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig01.jpg' alt='Posted Image' class='bbc_img' /></span></span> <br /><strong class='bbc'>Introduction</strong><br /> A scene graph is a very common structure used in computer graphics to organize the visible objects of a scene. By maintaining a hierarchy of objects in a scene, it becomes very intuitive to work with an object and its children. For example, if a vehicle object is added to a scene, whose children are its wheels, then by moving the vehicle we can automatically move the wheels as well. This automatic movement of the wheels is possible because the wheels are positioned relative to their parent, the vehicle. Relative positioning is a powerful concept, and is fundamental to the infinite scene graphs discussed in this article. Essentially, we must remember that whenever we position something in a 3D space, it is always positioned relative to something else, whether that something else is a vehicle, a planet, or a game universe.<br /><br /> Most scene graphs used in computer games today are restricted to being scene trees, as there will be a root scene node, and multiple generation of children underneath that, but never any cycles. For example, a common situation might be to have a city as the root node, with buildings and roads as its children. The children of the buildings will be the rooms, and the children of the rooms will be furniture. It is very uncommon in computer game scene graphs to see a scene graph node referencing itself or one of its parents, as it is more difficult to see any immediate application of how this would be useful. In our city, an example of a scene node referencing its parents is if, on the table inside a room in the city, there is a snow globe with a city inside of it, whose children are roads and buildings and etc...<br /><br /> Extending a scene tree to a scene graph allows us to create fractal-like objects with infinite details. In games, this means objects like trees or plants can be defined that have infinite detail. It allows the construction of unique worlds that would have otherwise been impossible to construct, such as worlds with the world contained in itself all over again. Some of the implementation details required for a true scene graph to be feasible will also be useful in their own right, allowing for massive scenes including objects of vastly differing magnitudes. Imagine zooming out from a leaf, to a tree, to a landscape, to a planet, to a solar system, and then to the entire galaxy, seamlessly.<br /><br /> This article will explain how to achieve an implementation of a true scene graph, as opposed to a scene tree. It will discuss how the scene graph idea itself, as well as the implementation details necessary to support it, can be applied to solve common and not so common problems in video games and other domains.<br /><br /> <br /><strong class='bbc'>Defining the Scene Graph</strong><br /> In order to begin discussing implementation details, we must first determine how to describe a scene graph. Imagine a man with his arms out, holding in each hand, another man with his arms out, holding in each hand, another man with his arms out, ad infinitum. If we were to explore a 3D scene like this, it might appear that there is an infinite amount of detail, so how can we hope to describe all of it? This is not quite true though. Just like how a recursive function is defined in a finite amount of code, we can also define this man scene in a finite amount of information.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig02.jpg' alt='Posted Image' class='bbc_img' /></span></span><br />Figure 1: A scene graph where a node references itself<br /><br />&nbsp;&nbsp;First, let's formalize the information in a few nodes of a standard scene tree. Imagine a scene consisting of a garage, with a car inside of that, and four wheels on the car. We might write this as follows:<br /><br /> Garage -&gt; (GarageModel, M<sub class='bbc'>1</sub>), (Car, M<sub class='bbc'>2</sub>)<br />Car -&gt; (CarModel, M<sub class='bbc'>3</sub>), (Wheel, M<sub class='bbc'>4</sub>), (Wheel, M<sub class='bbc'>5</sub>), (Wheel, M<sub class='bbc'>6</sub>), (Wheel, M<sub class='bbc'>7</sub>)<br />Wheel -&gt; (WheelModel, M<sub class='bbc'>8</sub>)<br /><br /> Where the line “X -&gt; (Y<sub class='bbc'>1</sub>, M<sub class='bbc'>1</sub>), (Y<sub class='bbc'>2</sub>, M<sub class='bbc'>2</sub>), ...” reads: for all i, Y<sub class='bbc'>i</sub> is a child of X, positioned relative to X by the linear transformation represented by the matrix M<sub class='bbc'>i</sub>. In our example, Garage, Car and Wheel are the internal scene tree nodes, and GarageModel, CarModel and WheelModel are the leaf scene graph nodes, or terminal nodes. The terminal nodes have no children, and these are the nodes that actually result in a model being rendered when traversed during the scene graph pass.<br /><br /> In other words, drawing a garage involves drawing a GarageModel and a Car (which involves drawing a CarModel and four Wheels (which involves drawing a WheelModel)). Drawing any of the terminal nodes, in this example, implies rendering the model of the object (IE: a car mesh loaded in from 3D Studio Max).<br /><br /> Now, in order to allow for the definition of a true scene graph as opposed to a scene tree, we simply allow any of the Y<sub class='bbc'>i</sub>'s in the line “X -&gt; (Y<sub class='bbc'>1</sub>, M<sub class='bbc'>1</sub>), (Y<sub class='bbc'>2</sub>, M<sub class='bbc'>2</sub>), ...” to be X itself, or a parent of X. We discussed the example of a man holding himself above, and now we shall formally describe it:<br /><br /> Man -&gt; (ManModel, I), (Man, M<sub class='bbc'>1</sub>), (Man, M<sub class='bbc'>2</sub>)<br /><br /> Where I is the identity matrix, and M<sub class='bbc'>1</sub> and M<sub class='bbc'>2</sub> are linear transformations placing the unit cube in to the ManModel's hands. Figure 2 shows a graphical representation of the above, Man definition, as seen in the provided implementation tool.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig03.jpg' alt='Posted Image' class='bbc_img' /></span></span><br />Figure 2: A graphical view of the "Man holding himself" scene definition. In the SceneComposer <a href='http://downloads.gamedev.net/features/programming/dynamic3Dsg/Dynamic3dSceneGraphs.zip' class='bbc_url' title='External link' rel='nofollow external'>sample application</a><br /><br />&nbsp;&nbsp;We now see that this seemingly complicated scene is actually very simply described.<br /><br /> <br /><strong class='bbc'>Implementation Details</strong><br /> Now for the fun part. We will first discuss the difference between the Scene Graph definition, which is described in the section above, and the Scene Graph instance, which is a tree. We will then discuss how to terminate the recursion so that the loops introduced by a scene node referencing itself or its parent will not cause our rendering loop to go on forever. Finally, we will discuss how to maintain smooth transitions from very large objects, such as a galaxy, to very small objects, such as leaves on a tree.<br /><br /> <br /><strong class='bbc'>Scene Definition vs. Scene Instance</strong><br /> Before moving on, we should discuss the difference between a scene graph definition, and a scene graph instance. While the definition of a scene graph is finite (for example, the “man holding himself” scene definition consists of only one line), the instance of a scene graph used for rendering will not be. The scene graph instance will always be a tree. The instance of a scene graph will be the tree formed by the following algorithm, whose input is the node in the scene definition to start instancing the scene at:<br /><br />&nbsp;&nbsp;SceneInstanceNode CreateSceneGraphInstance(SceneDefinitionNode instanceStart):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	// Instance the current node&nbsp;&nbsp;&nbsp;&nbsp;SceneInstanceNode instancedNode = new SceneInstanceNode(instanceStart); 	// Instance the children of the current node&nbsp;&nbsp;&nbsp;&nbsp;ForEach child of instanceStart:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;instancedNode.AddChild(&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CreateSceneGraphInstance(GetSceneDefinitionNode(child)),&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;GetMatrix(child));&nbsp;&nbsp;&nbsp;&nbsp;return instancedNode;&nbsp;&nbsp;The application of this algorithm on the “Man holding himself” scene definition is shown graphically in figure 3. The transformation from a scene definition to a scene instance is important, because since the scene instance is a tree, we can now simply process (IE: render) it using standard “scene graph” processing techniques. Obviously, if the Scene Definition graph has loops in it, then the algorithm presented here will produce an infinite scene instance. The title of this article gets its name from the fact that the scene tree which results from the instancing process is dynamically constructed.&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig04.jpg' alt='Posted Image' class='bbc_img' /></span></span><br />Figure 3: Example of CreateSceneGraphInstance algorithm applied to "Man holding man" scene<br /><br />&nbsp;&nbsp;Instancing of the scene definition can be implicit or explicit. Explicit scene instantiation is described above, and involves formally processing the scene definition in to a scene instance, and then processing the resulting scene instance like a normal scene tree. Implicit scene instantiation would be to process the scene definition as it is traversed. That is, no “Scene Instance” data is explicitly created.<br /><br /> Explicit scene instancing can allow for a more powerful and robust system, since every instantiated scene node can now be treated as its own object. That is, if a person scene node happens to be instanced twice, each of these instances can now maintain their own, separate properties, such as animation and state, even though they originated from only one scene definition node. The downside to explicit instantiation though, is that it requires more memory.<br /><br /> <br /><strong class='bbc'>Terminating the Scene Graph Instancing</strong><br /> As with all recursive algorithms, there must be a base case or else we will recurse forever. By restricting the linear transformations of the children of a node in the scene graph to be contractions (that is, the transformation has a component that scales the object down in size), we ensure that all of the children of a given node are smaller than that node. This implies that if we traverse down the nodes of a scene graph, we will arrive at smaller and smaller nodes. So, to terminate our scene instancing algorithm, we should stop recursing when our scene nodes are “small enough”.<br /><br /> So when is a scene node “small enough” to stop recursing? A good method for determining this is to use the screen-space size of the node about to be rendered. This can easily be approximated by determining the maximum screen-space size of the bounding box for the given scene node. In order for us to calculate the screen-space size, we must have access to the model-perspective transform, and so this must be passed down our scene instancing algorithm. At each node, we should apply this transformation to that node's bounding-box, giving the screen-space points of the bounding-box. Since an axis-aligned bounding box is easier to deal with, we create a 2D axis aligned bounding rectangle enclosing the 8 screen-space points of the cube. Now it is simply a matter of determining the area of this axis aligned bounding rectangle (IE: width * height) to give an upper bound on the screen-space area of the node.<br /><br /> With the screen-space area of a scene node available to us, we can check whether it is below a rendering threshold or not. If the screen-space area of the scene node is too small, or below the threshold, then we return early, and this particular node, or any of its (possibly infinite) children do not end up in the scene instance tree. Since each recursion of the scene instancing algorithm takes us to smaller and smaller nodes, we are guaranteed to eventually arrive at nodes that are below the instancing threshold, guaranteeing that the algorithm will always terminate. What we are left with is only a part of the (possibly infinite) scene instance tree, but this is the part that is most relevant to us, since everything that was cut are scene nodes that occupy very little screen space.<br /><br /> Frustum culling can also be effectively applied at this stage to farther reduce the size of the resulting instance tree. This is easily done since we already calculate the screen-space bounding cube points of each node.<br /><br /> <br /><strong class='bbc'>Infinite Zoom</strong><br /> <br /><strong class='bbc'>The Zooming Problems</strong><br /> While enough has been explained already to produce a working true scene graph rendering system, it's usefulness is currently limited to allowing for objects whose level of detail is increased the closer you come to them (for example, a plant defined recursively will come in to more detail the more screen-space it occupies). While this is already a nice improvement to existing scene trees, we can make things a lot more interesting by allowing for the user to zoom infinitely closer to items in the scene.<br /><br /> Our first attempt might be to allow zooming by simply decreasing the camera's field of view. Since we are passing down the model-projection matrix to our scene instancing algorithm, the lowered field of view will be taken in to account and the level of detail will be adjusted accordingly. Unfortunately, it becomes very difficult to navigate around an object of focus we have zoomed in to because the slightest rotation of the camera will target a completely different space. Also, we run into floating point accuracy problems when zooming in too much.<br /><br /> Stepping back from the problem a bit (no pun intended), we realize that another way to achieve “zooming” is to simply move the camera closer to the object. Due to the mechanics of the the perspective projection matrix, this has the effect of making objects bigger (or smaller when you move away from them). Using this method allows us to maintain the same field of view, so camera rotations have the same degree of effect when “zoomed in”. This method still leaves us with two problems though. The first problem is that if we continue to move our camera with the same speed we always had, it will now be moving much faster, relative to our new, smaller, object of focus. This is a problem because moving our camera 1 unit relative to a galaxy will move our camera an enormous number of units relative to a leaf, so clearly we must find a way to manipulate the speed of our camera. The second problem is that we still suffer from floating point accuracy problems, since floating point numbers do not allow us to represent an infinite amount of zoom to an object.<br /><br /> <br /><strong class='bbc'>The Zooming Solution</strong><br /> The problem of floating point accuracy, and the problem of unsuitable camera speed when viewing smaller objects have a common solution. We simply need to realize that we can render the exact same scene under completely different coordinate systems, as long as everything is still positioned relative to each other. The observer of your world won't know if a leaf is 1 unit big or 100 units big, as long as it is still fifty times as small as the tree it is attached to. What we need to do is be able to dynamically select different scene nodes as being the scene's reference node, the node that every other node, as well as the current model-view transformation, is relative to.<br /><br /> An Alice in Wonderland game, for example, might start with the world being rendered with the frame of reference being a large scene containing a landscape with a rabbit hole in it, and all of the geometry consisting of the interior of the rabbit hole, with a desk at the bottom of it that has a little bottle with the words “DRINK ME” on it. Within this scene graph might also be a much tinier room with a tiny, locked door in front of it. This smaller room will have its own scene graph children in it as well. When Alice drinks the contents of the bottle, the game engine can then make the smaller room the new reference, everything else being made relative to it. As long as the new frame of reference is not scaled on an entirely different order of magnitude from the old frame of reference, the viewer will not notice when the reference frame switch is made, as the camera's position and orientation will also have changed to be relative to the new frame of reference.<br /><br /> By taking the transformation that positions an object relative to the current frame of reference (calculated by accumulating the matrices in the scene graph leading from the frame of reference to the object in question), and inverting it, we obtain a transformation that positions the current frame of reference relative to the object in question. The following pseudocode shows how to calculate the matrix used to change the frame of reference:<br /><br />&nbsp;&nbsp;Matrix FindRelationMatrix(sourceNode, destNode):&nbsp;&nbsp;&nbsp;&nbsp;// Helper function for getting the list of all a node's ancestors&nbsp;&nbsp;&nbsp;&nbsp;// In the returned list, the original node will be the first item&nbsp;&nbsp;&nbsp;&nbsp;// and the root will be the last&nbsp;&nbsp;&nbsp;&nbsp;NodeList GetAncestorsList(node):&nbsp;&nbsp; 	ancestorList = [];&nbsp;&nbsp; 	curNode = node;&nbsp;&nbsp; 	while curNode:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	ancestorList.Append(curNode);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	curNode = curNode.parent;&nbsp;&nbsp;&nbsp;&nbsp;// Accumulate the tranform matrices leading from the source node to&nbsp;&nbsp;&nbsp;&nbsp;// the destination node&nbsp;&nbsp;&nbsp;&nbsp;Matrix sourceToRoot = Identity;&nbsp;&nbsp;&nbsp;&nbsp;Matrix rootToDest = Identity;&nbsp;&nbsp;&nbsp;&nbsp;For node in GetAncestorList(sourceNode):&nbsp;&nbsp; 	sourceToRoot = sourceToRoot * node.GetTransform().Inverse();&nbsp;&nbsp;&nbsp;&nbsp;For node in GetAncestorList(destNode):&nbsp;&nbsp; 	rootToDest = node.GetTransform() * rootToDest;&nbsp;&nbsp;&nbsp;&nbsp;return rootToDest * sourceToRoot;&nbsp;&nbsp;By applying this transformation to the current model-view transformation, we obtain a new view transformation that is relative to the object we wish to re-focus on. If the new frame of reference is smaller than the old one, we have just made the entire universe bigger, but since the view matrix has been adjusted to account for this as well, the universe appears unchanged. Now, though, we can begin to show even smaller objects that would have been problematic due to floating point accuracy in the previous frame of reference. By continually changing the frame of reference, we can continue to zoom in (or out) of objects infinitely. Notice also that since the view matrix is continually being modified, then so is the velocity of the camera, so that when we switch to a smaller frame of reference, the camera automatically begins to move slower. This is desirable, since it means the camera will now move at a reasonable speed to allow the viewer to analyze the detail of a much smaller object.<br /><br /> <br /><strong class='bbc'>Rendering Scenes With Big, Distant, Detailed Objects</strong><br /> When viewing a very small object up close, it is possible for the viewer to be also looking at a very large object in the background. Think of the camera looking at the tip of a leaf, while behind the leaf is a planet, and behind the planet is a sun. Even though the planet and sun are far away, they're also really big, so they still occupy considerable screen-space area. We can't render a leaf, a planet, and a sun in the same scene though, because the z-buffer would provide very poor quality for the distant objects, resulting in ugly z-fighting artifacts.<br /><br /> A solution to this problem exists by rendering the the scene instance tree in a breadth first fashion, starting at the root. This has the effect of drawing the largest objects first. A “rendering reference frame” is created, starting with the root (biggest) node. The camera is re-positioned relative to this reference frame, and rendering begins. When recursing down towards the current object of reference (IE: towards the viewer), we track how much the scale has changed from the root node. When it is detected that the scale has been reduced past a given threshold, we clear the current z-buffer, reset the “rendering reference frame” to the current scene node, adjust the camera according to the new reference frame, and continue rendering in to this new “layer”.<br /><br /> The layering approach works because we are traversing the instance scene tree towards the camera, so every time we enter a child that also contains the camera, we are rendering objects closer to the viewer. A downfall to this solution is that there will be rendering order artifacts if the bounding boxes of two scene node children overlap. In this case, it would be possible to flag that the particular node is “required to be rendered with the same z-buffer as its children”, so the z-buffer will never be cleared when jumping from that node to its children.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig05.jpg' alt='Posted Image' class='bbc_img' /></span></span><br />Figure 4: A man on in a city on a planet, with another solar system in his galaxy, all rendered in the same scene<br /><br />&nbsp;&nbsp;<br /><strong class='bbc'>Applications</strong><br /> To put what's been previously described in to some perspective, a description of how the system might be used in a game like Spore will be described. I do not know how Spore implemented their transitions from planet-side views to galactic views, I only discuss here how it might be implemented using the dynamic scene graph system.<br /><br /> In Spore, the player can obtain a spaceship with which they can zoom out to a view of the entire galaxy, flying around from one to the other. The player can fly to a solar system in a galaxy, and then a planet within the solar system. When the player flies towards a planet, the camera zooms in to the planet surface where the player can manage the local situation. This cannot be implemented relative to a single coordinate space, because the size of a planet surface is microscopic compared to the size of the galaxy, so there would be large floating point errors, resulting in inaccurate graphics and collisions on the planet surface.<br /><br /> We do not wish to generate the entire galaxy and every planet within it, since if we allow for an arbitrary amount of detail on all scales, we will run out of memory. Furthermore, in the context of a game with a changing game-state, it is less trivial to apply the dynamic scene graph generation strategy, because we want the game to never forget that we built a city on planet A, even if we're on the other side of the galaxy, and planet A is now too small to be in our scene graph instance. With these challenges overcome, we would have a reasonable framework to implement a Spore-like game within. We will discuss these topics now.<br /><br /> <br /><strong class='bbc'>Galaxies of Arbitrary Size and Detail</strong><br /> In order to support galaxies of arbitrary size and detail, we need a method of only fetching the detail currently relevant to us, while ignoring the rest. Suppose the planets in the galaxy and everything on them are created procedurally. When we go to instance the solar system scene graph nodes, they should be defined in such a way that the instancer is instructed to randomly position children planet nodes. The planet nodes will then be instanced as required, where they should instruct the instancer to randomly generate a landscape and populate it with cities. The cities may themselves be child nodes which are instructed to procedurally generate buildings and streets, and so on. Since the creation processes only happened during instancing, and we only instance scene graph nodes that are visible to us, then we are able to construct only the part of the universe that we are currently concerned with.<br /><br /> This same technique can be extended to support artist-defined universes. In this case we can load in the art assets for the player's starting point and then, during scene graph instancing, for every node that contains an art asset, we determine its children and whether or not to instance them. If we decide it's relevant to instance an art asset's children, then we load those child assets off the disk and store them in memory. Art assets should be removed in a Least Recently Used fashion, so that memory for art that is no longer relevant is continually freed. Once again, this method can be used to achieve an artist-defined universe with arbitrary detail that is completely connected, yet automatically only the portions of that universe relevant to the player will be loaded in to memory. It should be noted here that in this case, the art assets are actually nodes of the scene definition, so the actual scene definition will also be dynamically extended as new assets are loaded. This technique may also be useful for applications like Google Earth.<br /><br /> In order to for these more sophisticated instancing methods to function properly, we cannot blindly re-instance all our objects from the scene definition. We must remember the objects we have instanced, and only re-instance objects that have just become relevant. This is discussed next.<br /><br /> <br /><strong class='bbc'>Persisting Instanced Nodes</strong><br /> Before we proceed, it should be recognized that in a game like Spore, it would be convenient to deal with a planet and everything in it as a whole, as it is difficult to represent a spherical planet with mountains on it as a scene graph. What we can do instead is that when the planet is instanced, we can also instance the randomly generated terrain using standard terrain generation techniques as well as initialize a spatial partitioning structure to deal with cities and the critters moving around the planet. Everything instanced as part of the planet will operate within the planet's frame of reference, which is given meaning by the planet node's position within the scene graph instance.<br /><br /> When a planet is instanced, then, we would want to initialize many items inside of it. Furthermore, since the player can affect the state of an instanced planet, we would like to remember the changes they made to the planet's state. To accomplish this, we should mark the planet instance as “unforgettable” to the scene graph system when the player makes a change. In doing so, the planet should never be de-instanced, even if it is considered no longer relevant (at a given moment). In remembering a planet, we are forced to remember all of the planet's parents as well, otherwise the system has no way of recognizing where the planet fits in with the rest of the scene graph. This is not unreasonable on memory though, since the height of the instanced scene tree is logarithmic with respect to the amount of nodes in the tree.<br /><br /> It is true that a player could then create a “worst case” by jumping from planet to planet making slight changes, forcing our system to remember more changes than we have room for. This is an inevitable limitation however, as the computer is asked to literally remember more things than it has a memory for. A possible solution would be to assume that if the computer can't remember all the player's changes, then the player probably can't either, and we can apply a Least Recently Used scheme to forget changes that the user has made to the game world.<br /><br /> In order for instance remembering to work, our scene instancing algorithm should be augmented with logic to check that a scene element has already been instanced and if so, it should re-use the existing instance instead of re-instancing. Pseudocode is given for this style of instancing, where the algorithm will take the root of an existing scene instance tree and instance any nodes that have not yet been instanced, while deleting nodes that no longer need to exist.<br /><br />&nbsp;&nbsp;CreateSceneGraphInstance(SceneInstanceNode instanceStart):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Check if this node's children have already been instanced&nbsp;&nbsp;&nbsp;&nbsp;if instanceStart.HasChildren():&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if IsRelevant(instanceStart):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Simply recurse in to the children, in this case&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ForEach child of instanceStart:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 	CreateSceneGraphInstance(child);&nbsp;&nbsp; 	else:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// Children are no longer needed, get rid of them&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;instanceStart.DeleteChildren();&nbsp;&nbsp;&nbsp;&nbsp;else:&nbsp;&nbsp; 	// Children have not been instanced.&nbsp;&nbsp; 	If IsRelevant(instanceStart):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// And they are relevant, so instance them&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;instanceStart.InstanceChildren();&nbsp;&nbsp;A <a href='http://downloads.gamedev.net/features/programming/dynamic3Dsg/Dynamic3dSceneGraphs.zip' class='bbc_url' title='External link' rel='nofollow external'>sample implementation</a> is provided. It includes two programs, Scene Composer and Scene Explorer. Scene Explorer can be used to quickly view a scene created by Scene Composer. Scene Composer allows you to import Wavefront Obj model files and then compose them together to create a scene graph definition. The sample scene composer allows you to specify more than one production with the same symbol on the left side (ie: “Planet -&gt; RedPlanet” and “Planet -&gt; Green Planet”). In this case, the scene instancer will randomly choose one of these productions when it instances a Planet symbol. Since it is randomly choosing a production to use here, we must remember the choice so that when we instance the next frame, we do not randomly choose another production. This is implemented by keeping the instance tree available for the next frame, at which time the instancer will traverse the old instance tree, instancing new symbols only when they have not already been instanced. Symbols that become irrelevant are destroyed.<br /><br /> Some example scenes are provided. The “Fern” example attempts to re-create the popular 2D fern fractal, in 3D. It uses a green cylinder as the only model, representing the fern's “stem”. It then uses the fern symbol recursively to represent the fern's branches. The final scene resembles a plant with infinite detail which can be zoomed in to or out of. This example demonstrates how the infinite scene graph system can be used to create objects of infinite detail.<br /><br /> The “Odyssey” example features a galaxy of solar systems of planets of cities of buildings with men in them. The buildings in the city have random heights, accomplished by having the “building” production randomly choose between four building heights, when instanced. The men in the scene recursively feature galaxies in their eyes, so the viewer can zoom in or out as much as they want. This example shows how one can implement an explorable Spore-like universe (although the instanced scene nodes maintain very little state separate from the scene definition).<br /><br /> The sample application is packaged as a Windows installer, which includes the above examples and others. The installer will place a README file in your start menu explaining how to use the SceneExplorer and the SceneComposer.<br /><br /> The source code is also available as a separate download as a zip file. It is built using SCons and requires some Boost libraries. I have not tested it on any compiler besides MSVC 2005 and MSVC 2008.<br /><br /> <br /><strong class='bbc'>Conclusion</strong><br /> The full system described here consists of the combination of a number of techniques which compliment each other. Each of these techniques provides for interesting applications in their own right. These techniques can be applied together, or by themselves to solve common problems that arise in video games, as well as enabling new creative game-play ideas.<br /><br /> With a scene graph that can contain loops, we allow objects to have an infinite amount of detail, where the amount of visible detail is determined by the screen-space area that the object occupies. Objects such as plants can be defined recursively, similar to how the famous fern fractal is defined. Applying this technique can give a plant that occupies very little detail (and therefore graphics processing time) when far away, but is continuously refined as the viewer comes closer. Applying this technique allows an artist to describe an object in as much detail is they wish, letting the run-time engine determine how much of that detail to display at any time. A fern-like object constructed with the system is shown in figure 5.<br /><br />&nbsp;&nbsp;<span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/dynamic3Dsg/fig06.jpg' alt='Posted Image' class='bbc_img' /></span></span><br />Figure 5: A view of a fern object. A green cylinder is the only model used<br /><br />&nbsp;&nbsp;Using the system that allows for a change in frame of reference, we enable scene graphs where the very large objects, such as galaxies, can co-exist in the same data structure with the very small objects, such as leaves. This is a very powerful system, since a viewer can seamlessly move from a close up of the tip of a leaf, back to a wide-shot view of an entire galaxy. As long as the frame of reference is continually switched to a bigger (or smaller) one as the user zooms out (or in), one gigantic scene can describe an entire galaxy and everything inside it with intricate detail.<br /><br /> These techniques can also be applied to create previously unimaginable artistic continuous environments where the user can continuously zoom in to objects as far as they desire, cycling the user back to previous environments as needed. It opens the door to creative new ideas, and previously unimaginable scenes.<br /><br /> <br /><strong class='bbc'>References</strong><br /> The model of the man was created in Blender by following the tutorial from <a href='http://en.wikibooks.org/wiki/Blender_3D:_Noob_to_Pro/Modeling_a_Simple_Person' class='bbc_url' title='External link' rel='nofollow external'>this site</a><br /><br />]]></description>
		<pubDate>Wed, 10 Dec 2008 10:00:08 +0000</pubDate>
		<guid isPermaLink="false">21a38ed2ee0c2c081c31286d29db2acc</guid>
	</item>
	<item>
		<title>Computation of Bounding Primitives on the GPU</title>
		<link>http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/computation-of-bounding-primitives-on-the-gpu-r2582</link>
		<description><![CDATA[<br /><strong class='bbc'>Introduction</strong><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/gpuBoundingSphere/fig1.png' alt='Posted Image' class='bbc_img' /></span></span>Finding the bounds for a set of triangles is one of the most frequently-performed computations in games. It is fundamental to collision detection, scene culling, and ray tracing. Performing it on the GPU offloads work from the CPU, and can be an order of magnitude faster. The axis-aligned bounding box, or AABB, can be represented as 6 planes:<br /><br />&nbsp;&nbsp;struct AABB {&nbsp;&nbsp;&nbsp;&nbsp;float Left;&nbsp;&nbsp;&nbsp;&nbsp;float Right;&nbsp;&nbsp;&nbsp;&nbsp;float Top;&nbsp;&nbsp;&nbsp;&nbsp;float Bottom;&nbsp;&nbsp;&nbsp;&nbsp;float Near;&nbsp;&nbsp;&nbsp;&nbsp;float Far; };&nbsp;&nbsp;Perhaps a more common alternative is simply storing two points for the corners:&nbsp;&nbsp;struct AABB {&nbsp;&nbsp;&nbsp;&nbsp;Vector3 MinCorner;&nbsp;&nbsp;&nbsp;&nbsp;Vector3 MaxCorner; };&nbsp;&nbsp;CPU pseudocode to compute the bounding box over a set of points is trivial:&nbsp;&nbsp;AABB aabb; aabb.MinCorner = (MaxFloat, MaxFloat, MaxFloat); aabb.MaxCorner = (-MaxFloat, -MaxFloat, -MaxFloat); for each vertex in buffer {&nbsp;&nbsp;&nbsp;&nbsp;aabb.MinCorner.x = Min(vertex.x, aabb.MinCorner.x);&nbsp;&nbsp;&nbsp;&nbsp;aabb.MinCorner.y = Min(vertex.y, aabb.MinCorner.y);&nbsp;&nbsp;&nbsp;&nbsp;aabb.MinCorner.z = Min(vertex.z, aabb.MinCorner.z);&nbsp;&nbsp;&nbsp;&nbsp;aabb.MaxCorner.x = Max(vertex.x, aabb.MaxCorner.x);&nbsp;&nbsp;&nbsp;&nbsp;aabb.MaxCorner.y = Max(vertex.y, aabb.MaxCorner.y);&nbsp;&nbsp;&nbsp;&nbsp;aabb.MaxCorner.z = Max(vertex.z, aabb.MaxCorner.z); }&nbsp;&nbsp;If you wish to leverage D3DX, simply use the D3DXComputeBoundingBox function to perform the above operation. <br /><strong class='bbc'>Naïve GPU Approach: Leverage the Depth Buffer</strong><br /> The aforementioned loop is embarrassingly parallelizable and therefore well-suited to the GPU. Using a general-purpose computation language like NVIDIA’s CUDA technology would make this easy, but it is often desirable to stay within the confines of the graphics API itself. How could we use the graphics API to perform a computation like this? The divide-and-conquer philosophy tells us to approach the problem by solving its sub-problems first. It can be helpful to think about only one dimension at a time. If we can simply compute the near plane, then we should be able to compute the other five planes via transformations.<br /><br /> Finding the “nearest” point is something fundamental to real-time graphics. Anyone familiar with OpenGL or D3D knows that where there is 3D rendering, a depth buffer lurks beneath:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/gpuBoundingSphere/fig2.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br /> Note that we are interested in only <em class='bbc'>one</em> depth value, while the depth buffer represents <em class='bbc'>many</em> depth values. The trick is to render to a 1x1 viewport. It’s difficult to visualize rendering an entire model into a 1x1 area, so it may help to think of it as the end of a continuously shrinking viewport:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/gpuBoundingSphere/fig3.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br />&nbsp;&nbsp;&nbsp;&nbsp;<br /><strong class='bbc'>Aside: Depth Buffer Visualization with Direct3D 10</strong><br /> To produce a grayscale visualization of the depth buffer similar to the pictures in this article, one approach using D3D10 is the following: <ul class='bbcol decimal'><li>Create a texture that has a color format equivalent to the depth buffer format. For example, if the depth buffer is DXGI_FORMAT_D32_FLOAT, then the texture should have a format of DXGI_FORMAT_R32_FLOAT. The texture should have the same width and height as the render target. </li><li>After rendering the scene as you normally would, un-bind the depth buffer by passing null into the last argument of ID3D10Device::OMSetRenderTargets(). </li><li>Use ID3D10Device::CopyResource() to copy the values from the real depth buffer into the grayscale texture from step 1. </li><li>Bind the grayscale texture to a pixel shader resource and draw it to the screen using a full-screen quad. </li><li>An example of this procedure can be found in the sample code attached to this article.</li></ul>&nbsp;&nbsp;&nbsp;&nbsp;Since the nearest point always trumps the points that render behind it, the depth buffer of our 1x1 render target should hold the nearest Z value. Computing values for the other five planes can be achieved by simply rotating the model appropriately:<br /><br /> <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/gpuBoundingSphere/fig4.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br /> There are some caveats to using the depth buffer in this way:<br /><br /> <ul class='bbc'><li>Be sure to use a high-precision depth buffer. In the case of D3D10, use DXGI_FORMAT_D32_FLOAT when creating your depth texture. Remember that the viewport is only 1x1, so the performance implication of this is relatively unimportant. </li><li>Because we are rendering to a 1x1 viewport, each triangle takes up a screen area that is only a fraction of a pixel. This can lead to rendering only one interior point per triangle, which results in an incorrect bounding box. To bypass this issue, use point primitives rather than triangles. In the case of D3D10, set your primitive topology to D3D10_PRIMITIVE_TOPOLOGY_POINTLIST.</li></ul> It seems cumbersome to re-render the object in 6 different orientations. One idea is to use only three rotations, then set the graphics API to use reverse testing for each “Max” plane, as depicted here: <span rel='lightbox'><span rel='lightbox'><img class='bbc_img' src='http://images.gamedev.net/features/programming/gpuBoundingSphere/fig5.png' alt='Posted Image' class='bbc_img' /></span></span><br /><br /> To achieve this with D3D10, set your DepthFunc to GREATER_EQUAL rather than the default LESS. Also be sure to change your clear value appropriately. So, your effect file would change as follows:<br /><br />&nbsp;&nbsp;DepthStencilState MyDepthState {&nbsp;&nbsp;&nbsp;&nbsp;DepthEnable = TRUE;&nbsp;&nbsp;&nbsp;&nbsp;DepthFunc = <del class='bbc'>LESS</del> GREATER_EQUAL;&nbsp;&nbsp;&nbsp;&nbsp;DepthWriteMask = ALL; };&nbsp;&nbsp;And your clear call would change like this:&nbsp;&nbsp;device-&gt;ClearDepthStencilView(depthView, D3D10_CLEAR_DEPTH, <del class='bbc'>1</del> 0, 0);&nbsp;&nbsp;Unfortunately, this approach is not much of an improvement as it still requires a total of 6 rendering passes. How can we reduce the number of passes down to 1 or 2? <br /><strong class='bbc'>Exploit the Blending Hardware</strong><br /> When rendering, we typically render to a color buffer, not just a depth buffer alone. Each pixel in our 1x1 render target has slots for red, green, and blue that we’ve been ignoring. The secret sauce lies in the blending hardware. Typically, blending is used to compute a simple weighted sum, as in:&nbsp;&nbsp;<em class='bbc'>αc</em><sub class='bbc'>src</sub>+(1-α) <em class='bbc'>c</em><sub class='bbc'>dest</sub><br /><br />&nbsp;&nbsp;However, graphics APIs typically allow more than just addition, as seen in the following table.&nbsp;&nbsp; <strong class='bbc'>DirectX 9</strong> <strong class='bbc'>DirectX 10</strong> <strong class='bbc'>OpenGL</strong>&nbsp;&nbsp; D3DBLENDOP_ADD D3D10_BLEND_OP_ADD GL_FUNC_ADD&nbsp;&nbsp; D3DBLENDOP_SUBTRACT D3D10_BLEND_OP_SUBTRACT GL_FUNC_SUBTRACT&nbsp;&nbsp; D3DBLENDOP_REVSUBTRACT D3D10_BLEND_OP_REV_SUBTRACT GL_FUNC_REVERSE_SUBTRACT&nbsp;&nbsp; D3DBLENDOP_MIN D3D10_BLEND_OP_MIN GL_MIN&nbsp;&nbsp; D3DBLENDOP_MAX D3D10_BLEND_OP_MAX GL_MAX&nbsp;&nbsp; Those last two rows should catch your eye!<br /><br /> Before we can use these blending operations, we should make sure that we’re rendering into a floating-point render target. In D3D10, this can be done with DXGI_FORMAT_R32G32B32A32_FLOAT. Despite the fact that we’re exploiting the hardware’s blending stage, we actually will <em class='bbc'>not</em> be using the alpha channel. We specify alpha in our format only because it is typically required by the API when specifying a render target. So, by simply writing X Y Z into our R G B channels while using MIN blending for the first pass, and MAX blending for the second pass, we can compute a complete bounding box! The D3D10 HLSL for this technique follows.<br /><br />&nbsp;&nbsp;struct VS_OUTPUT {&nbsp;&nbsp;&nbsp;&nbsp;float4 Position : SV_POSITION;&nbsp;&nbsp;&nbsp;&nbsp;float4 Pretransformed : POSITIONT; }; VS_OUTPUT FindBoundsVS(float4 Position : POSITION) {&nbsp;&nbsp;&nbsp;&nbsp;VS_OUTPUT Output;&nbsp;&nbsp;&nbsp;&nbsp;Output.Position = float4(0, 0, 0, 1);&nbsp;&nbsp;&nbsp;&nbsp;Output.Pretransformed = Position;&nbsp;&nbsp;&nbsp;&nbsp;return Output; 	} float4 FindBoundsPS(VS_OUTPUT In)&nbsp;&nbsp;: SV_TARGET { 	return In.Pretransformed; }&nbsp;&nbsp;Since we’re rendering into a 1x1 viewport, the vertex shader sets the outgoing position to (0, 0, 0), then writes the pretransformed position out to the pixel shader, which simply writes it out into the floating-point render target. To read the values of our bounding box out from the render target, we’ll need to copy the values into a “staging area” that can be accessed from the CPU. Because of this copy, which is slow, it is best to actually render to a 2x1 texture, and shift the 1x1 viewport between passes, allowing us to copy only once instead of twice. Summarizing the technique for D3D10: <ul class='bbcol decimal'><li>Create two floating-point 2x1 textures with the following parameters: </li><li> <ul class='bbc'><li>BindFlags = RENDER_TARGET, CPUAccessFlags = 0, Usage = DEFAULT </li><li>BindFlags = 0, CPUAccessFlags = READ, Usage = STAGING</li></ul> </li><li>Bind the output merger stage to the 2x1 render target texture. </li><li>Set the viewport to 1x1 with TopLeft at (0, 0). </li><li>Set the blending state to use BlendOp = MIN, SrcBlend = SRC_COLOR, DestBlend = DEST_COLOR. </li><li>Render the geometry using the pass-through shaders described above. </li><li>Move the viewport’s TopLeft to (1,0) </li><li>Change the BlendOp to MAX. </li><li>Render the geometry again. </li><li>Unbind the output merger stage from the 2x1 render target. </li><li>Call ID3D10Device::CopyResource() to copy data from the render target texture to the staging texture. </li><li>Call ID3D10Texture2D::Map() on the staging texture to read the values of the two corners from the color buffer.</li></ul><strong class='bbc'>Conclusion</strong><br /> If you’ve got your API state set up correctly, computing a bounding-box can be as simple as re-rendering the object into a 1x1 viewport and reading back the color values. The GPU represents an enormous amount of computing power that is now typical in every home PC. By thinking outside the box (no pun intended), it’s easy to find new and interesting ways to offload work from the CPU.]]></description>
		<pubDate>Wed, 12 Nov 2008 09:00:39 +0000</pubDate>
		<guid isPermaLink="false">b9531e7d2a8f38fe8dcc73f58cae9530</guid>
	</item>
</channel>
</rss>