| Computation of Bounding Primitives on the GPU | |
|
Computation of Bounding Primitives on the GPU
Introduction Finding the bounds for a set of triangles is one of the most frequently-performed computations in games. It is fundamental to collision detection, scene culling, and ray tracing. Performing it on the GPU offloads work from the CPU, and can be an order of magnitude faster.
The axis-aligned bounding box, or AABB, can be represented as 6 planes:
struct AABB
{
float Left;
float Right;
float Top;
float Bottom;
float Near;
float Far;
};
Perhaps a more common alternative is simply storing two points for the corners:
struct AABB
{
Vector3 MinCorner;
Vector3 MaxCorner;
};
CPU pseudocode to compute the bounding box over a set of points is trivial:
AABB aabb;
aabb.MinCorner = (MaxFloat, MaxFloat, MaxFloat);
aabb.MaxCorner = (-MaxFloat, -MaxFloat, -MaxFloat);
for each vertex in buffer
{
aabb.MinCorner.x = Min(vertex.x, aabb.MinCorner.x);
aabb.MinCorner.y = Min(vertex.y, aabb.MinCorner.y);
aabb.MinCorner.z = Min(vertex.z, aabb.MinCorner.z);
aabb.MaxCorner.x = Max(vertex.x, aabb.MaxCorner.x);
aabb.MaxCorner.y = Max(vertex.y, aabb.MaxCorner.y);
aabb.MaxCorner.z = Max(vertex.z, aabb.MaxCorner.z);
}
If you wish to leverage D3DX, simply use the D3DXComputeBoundingBox function to perform the above operation.
Naïve GPU Approach: Leverage the Depth BufferThe aforementioned loop is embarrassingly parallelizable and therefore well-suited to the GPU. Using a general-purpose computation language like NVIDIA’s CUDA technology would make this easy, but it is often desirable to stay within the confines of the graphics API itself. How could we use the graphics API to perform a computation like this?The divide-and-conquer philosophy tells us to approach the problem by solving its sub-problems first. It can be helpful to think about only one dimension at a time. If we can simply compute the near plane, then we should be able to compute the other five planes via transformations. Finding the “nearest” point is something fundamental to real-time graphics. Anyone familiar with OpenGL or D3D knows that where there is 3D rendering, a depth buffer lurks beneath:
Note that we are interested in only one depth value, while the depth buffer represents many depth values. The trick is to render to a 1x1 viewport. It’s difficult to visualize rendering an entire model into a 1x1 area, so it may help to think of it as the end of a continuously shrinking viewport:
Since the nearest point always trumps the points that render behind it, the depth buffer of our 1x1 render target should hold the nearest Z value. Computing values for the other five planes can be achieved by simply rotating the model appropriately:
There are some caveats to using the depth buffer in this way:
To achieve this with D3D10, set your DepthFunc to
DepthStencilState MyDepthState
{
DepthEnable = TRUE;
DepthFunc =
And your clear call would change like this:
device->ClearDepthStencilView(depthView, D3D10_CLEAR_DEPTH,Unfortunately, this approach is not much of an improvement as it still requires a total of 6 rendering passes. How can we reduce the number of passes down to 1 or 2? Exploit the Blending HardwareWhen rendering, we typically render to a color buffer, not just a depth buffer alone. Each pixel in our 1x1 render target has slots for red, green, and blue that we’ve been ignoring. The secret sauce lies in the blending hardware. Typically, blending is used to compute a simple weighted sum, as in:αcsrc+(1-α) cdest
Those last two rows should catch your eye! Before we can use these blending operations, we should make sure that we’re rendering into a floating-point render target. In D3D10, this can be done with
struct VS_OUTPUT
{
float4 Position : SV_POSITION;
float4 Pretransformed : POSITIONT;
};
VS_OUTPUT FindBoundsVS(float4 Position : POSITION)
{
VS_OUTPUT Output;
Output.Position = float4(0, 0, 0, 1);
Output.Pretransformed = Position;
return Output;
}
float4 FindBoundsPS(VS_OUTPUT In) : SV_TARGET
{
return In.Pretransformed;
}
Since we’re rendering into a 1x1 viewport, the vertex shader sets the outgoing position to (0, 0, 0), then writes the pretransformed position out to the pixel shader, which simply writes it out into the floating-point render target.
To read the values of our bounding box out from the render target, we’ll need to copy the values into a “staging area” that can be accessed from the CPU. Because of this copy, which is slow, it is best to actually render to a 2x1 texture, and shift the 1x1 viewport between passes, allowing us to copy only once instead of twice. Summarizing the technique for D3D10:
ConclusionIf you’ve got your API state set up correctly, computing a bounding-box can be as simple as re-rendering the object into a 1x1 viewport and reading back the color values. The GPU represents an enormous amount of computing power that is now typical in every home PC. By thinking outside the box (no pun intended), it’s easy to find new and interesting ways to offload work from the CPU.Discuss this article in the forums
See Also: © 1999-2009 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
|