Baking Occlusion Maps

Started by
6 comments, last by Hodgman 10 years, 11 months ago

I'm working on baking Ambient Occlusion for my static objects. I have them unwrapped to a light map texture with no shared overlapping UV's.

For each of those pixels I compute the occlusion only, no shadows, light color, emission. Just occlusion. And this includes self-occlusion as well as occlusion from any other objects in the scene.

I'm wondering what peoples techniques are and how fast they are. My current technique can generate a 512x512 occlusion map in 2.9 mins. I'm still working to speed this up and think I may potentially be able to get this to about 1 minute. 512 is relatively high resolution I would think for an occlusion map for an object since it will just get blurred, in my opinion. The results are fine in terms of texture quality, I just want to know how fast other algorithms are because I have to bake a ton of objects and want to know if I should maybe go to a CPU ray caster. Anyone else have specs that they have or can run on baking a 512 size texture?

Current algorithm on GPU:
Render objects world position and world normals into the ambient occlusion texture.
Read the normals and positions to the CPU.
For each pixel in the occlusion texture that has a normal (IE a triangle mapped to it and wrote a normal)
--->Set a camera in the world at position from texture

--->Set camera orientation based off of the normal

--->Cull the scene against camera, and render the scene. This is the depth buffer from the pixels point of view on the object.
--->Take all pixels in the depth buffer, compare them to generate an occlusion value for this pixel

I render the depth buffer to a 4x4 texture (so 16 depth samples per pixel). Could do 8x8 but it really doesn't matter after the blur is applied.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Advertisement

I have a project I've been working on at home that uses Intel's embree ray-tracing library to pre-bake various kinds of GI. I fired it up with Crytek's version of Sponza (~262K triangles) and baked the equivalent of a 730x730 AO map using 625 rays per sample point, and it completed in about 18 seconds using all 4 cores of my Core i7 2600. So yeah, I think you can do better than 3 minutes. tongue.png

The way you're baking AO definitely works and is fairly easy to get going, but it can be really slow since there's a lot of overhead from setting up the GPU and reading back the results for every sample point that you bake. The GPU also never really gets fully utilized, since it has keep serializing. A ray-tracer on the other hand can be really fast for baking occlusion since it's trivial to parallelize. Embree is very very fast (especially if you only want to know if a ray is occluded instead of finding the closest intersection point), and GPU ray-tracers can be extremely fast as well. At work we use Optix to bake GI, and we can bake multiple passes for huge scenes in a minute or two on a beefy GPU.

I should also point out that it sounds like you're not quite baking AO correctly based on your description. To get a proper AO bake for a sample point, you need to sample occlusion in all directions in the hemisphere surrounding the normal. With your approach you'll only get directions within the frustum which can only cover 90 degrees, so you'll miss the sides. Typically when using a rasterization approach you'll render to a "hemicube" by rendering in 5 directions for each sample point. You'll also want to make sure that you weight each occlusion sample by the cosine of the angle between the sample direction and the surface normal (N dot R), otherwise it won't look right. With a ray-tracer using monte carlo integration you can actually "bake" the cosine term into your distribution of sample ray directions as a form of importance sampling, which lets you skip the dot product and also get higher quality results.

I can change the FOV to get most of the hemisphere (its a bit warped of course), but it works. I also don't read back anything on the CPU. I bind the depth texture and render a 1x1 pixel point to the final Occlusion texture using a shader to compute occlusion. But yes even doing that as I said is slow.

I wanted to test this to see if it would be faster than a cpu approach since it sounded like it may have been until I got 3 mins.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Is there an article on how to use Embree for baking occlusion? I built the code but I'm not sure how to extract this to put into my own engine / bake occlusion.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Not that I know of. If you download the embree source code you'll see there's 4 main projects in there: common, renderer, rtcore, and viewer. You're really only interested in rtcore, which is the core kernel used for casting rays and getting the intersection. However it depends on common, so you need that as well. Basically what I did was I built rtcore and common libs for debug and release, then copied them to a Lib folder that I made, and then copied all of the headers from common and rtcore into an Include folder that I also made. Then I just linked to common.lib and rtcore.lib in my own project.

In your code, you'll want to create an acceleration structure from your scene meshes that you can cast rays into. To do this you call rtcCreateAccel and pass it arrays of embree::BuildTriangle and embree:BuildTriangle, with each containing the vertices and triangles of your scene. There's a lot of different options for creating the acceleration structure, I currently use BVH4 since it builds quickly. There's examples in the docs, but this is what I use for the parameters:

rtcCreateAccel("bvh4.spatialsplit", "default", bvhData.Triangles.data(), totalNumTriangles, vertices.data(), totalNumVertices, empty, false);


Once you have your acceleration structure, you can query it for an intersector like this:

bvhData.Intersector = bvhData.BVH->queryInterface<Intersector>();


And that's it, you're ready to cast rays. For occlusion you can just call intersector->occluded and pass it an embree::Ray to get a bool telling you if the ray is occluded. To get the full intersection you call intersector->intersect.
I bake AO "inside out" (sky visibility) by positioning hundreds of directional lights outside the object, pointing in. Each light renders the object to a 512 depth buffer, and uses that to apply a small amount of light to a 1024 AO texure (by rendering the object in lightmap-UV space). This produces blurry results, and doesn't work for indoor areas / cavities, but only takes a few seconds on a GeForce7, so I'm able to do it for a whole level during the loading screen (where the GPU was idle anyway).

MJP - Sounds kind of lame if I can't instance geometry and have it have a matrix. IE if I have instanced geometry I have to tranform all verts into world space for each instanced mesh and load those into Embree. Even for non-instanced geometry, when I export it, it still has a transform applied to it so that its positioned in the world properly.

Hodgman - Yea can't do it that way because of the cavities/indoor. Plus that doesn't really work well in outdoor forests either. If you have a few close boxes, but they are under a ton of tall trees, then it will get no direct lighting at all because leaves will cover every one of those depth buffers, so I have to start at the object and go out just like SSAO.

BTW, I did get it down to 1.8 mins just by getting rid of my "preview" where I show the output to the screen for each camera to see it working in real time.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Another idea -- at the moment you're doing 'for each position, trace 16 rays', you could instead use the algorithm from this ray-tracer, which is 'for each ray direction, trace rays through a million positions', which maps better to GPU rasterization hardware and might result in less overall passes of the scene.

Hodgman - Yea can't do it that way because of the cavities/indoor. Plus that doesn't really work well in outdoor forests either. If you have a few close boxes, but they are under a ton of tall trees, then it will get no direct lighting at all because leaves will cover every one of those depth buffers, so I have to start at the object and go out just like SSAO.

My very cheap hack to deal with that is to mostly place lights in the upper hemisphere, but also place some in the lower hemisphere, and to not render the ground into the depth buffer, so it doesn't cast shadows upwards onto objects. This allows objects that are covered by a 'canopy' to still receive some level of "AO" gradients.

Stealing inspiration from the above link, you could also solve this issue completely by rendering the shadow maps with depth-peeling, which allows you to measure the length of the ray from the surface position to the nearest occluder, rather than the "most outside" occluder.

This topic is closed to new replies.

Advertisement