Jump to content
  • Advertisement
Sign in to follow this  
Madoc

Fast ray casting for AO

This topic is 3855 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm wanting to optimise ray casting for ambient occlusion purposes. For a high quality render, we are casting near 4k rays per texel on the destination texture. This gets a bit slow, but even with less rays for "preview" purposes it's quite slow. We are currently using a well optimised octree to accellerate the ray casts. To be honest, I wouldn't be expecting or looking for *much* better performance if it weren't that NVidia's Melody seems to handle large numbers of rays for ambient occlusion considerably quicker. Does anyone know what they might be doing? I have considered a number of potential optimisations but I don't think any would be very general purpose or effective. I would still stick to ray casting and not something more approximate but a limited quality or accuracy sacrifice is fine for a preview mode.

Share this post


Link to post
Share on other sites
Advertisement
Using a kd-tree is faster than on octree for ray casting.
If written correctly it is O(log2(N)) ray-box intersections + 1 ray-triangle intersection. There are a lot of things that can be optimized...make sure your ray box intersections are optimized to the max, as well as your sorting (mergesort best), and make sure you choose the best axis on each level when building kd-tree do not just alternate x/y/z. For the ray box intersections I also precalculate all the divisions so that when you finally get down to it, its only a few jump instructions per ray cast.

Share this post


Link to post
Share on other sites
I don't know if this will be faster, but there's a paper on the web that shows occlusion being computed using a render to texture operation.

instead of raycasting from each texel, it renders the scene from the point of view of the texel, with a view direction along the normal. The buffer is cleared to white, and the scene is rendered black. An average of the buffer is taken as the texel's occlusion value.

I've been messing about with the above for a while now, and pretty unoptimised, it runs at around 150-200 texels per second.

Hope that helps.

Share this post


Link to post
Share on other sites
You also dont need to shoot the rays from each texel...you can do it at specific points and then interpolate inbetween. 4000 per sample also sounds extremely exorbitant...Im making good quality AO maps with more like 16-30 rays per sample.

Share this post


Link to post
Share on other sites

Thanks for the replies.

KD trees is something I have been considering but I haven't looked at them in detail, I just know the basic concept. If they really are that much more efficient then I'll definitely give them a shot.

The intersection tests use precomputed data and are heavily optimised.

The reasons for the high number of rays are two, one is that multisampling is used and the other is that the surfaces are extremely complex and the fine details are important. We don't get visually acceptable results with less than about 1k rays per texel. We use 4k for production but that's fine as we can just leave some machine grinding away at it once the model is ready for it.

I'm also not so sure about Melody being ever so quick anymore. I just tried it again with one of our models and 200 rays and it took well over an hour. I'm sure I've seen it go much faster, it's hard to guess what affects it's performance though (it's also a bit of a pain to use...).

_Lopez, I have heard of such methods and I suppose rendering depth in half a cube map might work pretty well but I also see a lot of problems and added complexity that I'm not prepared to deal with.

Share this post


Link to post
Share on other sites
You can actually use lots of shadow maps to generate AO data effectively, as detailed in GPU Gems 2. I've certainly used that method to good effect, and that may be what NVIDIA's tool is doing.

Share this post


Link to post
Share on other sites
That was basically in my reply to _Lopez. But for what we do, we'd need to render 75 images per texel and without some expensive additional work the results would be incorrectly biased. You also have the problem of the near clipping plane (which you can eliminate but not without causing more problems). This has to be precision work.

Edit:

I estimate the GPU rendering path as above would take about a week as opposed to a couple of hours. Also NVidia's Melody allows you to choose a *specific number* of "rays", that suggests rays is indeed what they are using.

kd trees look nice. I haven't found any decent literature but playing with the idea I can see some really neat tricks to speed things up. I'll definitely try it tomorrow.

[Edited by - Madoc on November 1, 2007 7:53:42 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Madoc
But for what we do, we'd need to render 75 images per texel and without some expensive additional work the results would be incorrectly biased. You also have the problem of the near clipping plane (which you can eliminate but not without causing more problems). This has to be precision work.

No, a single shadow map (from a single directional light) can be used to accumulate light only *all* of your vertices/texels, assuming "good enough" resolution (which isn't a problem for this type of task... you can easy use super-high-resolution shadow maps if necessary).

Quote:
Original post by Madoc
I estimate the GPU rendering path as above would take about a week as opposed to a couple of hours.

Wow, that's pretty intense ;) I compute AO for an almost-certainly-smaller model (couple million vertices) using 256 "rays" (shadow maps), each 2048^2 and it takes about 3 seconds on an 8800GTX. Granted you probably want "more precise results", but I don't see how it could be several orders of magnitude slower!

Quote:
Original post by Madoc
Also NVidia's Melody allows you to choose a *specific number* of "rays", that suggests rays is indeed what they are using.

As I explained above you get one "ray" per texel for each shadow map, so saying number of "rays" doesn't actually imply ray tracing. However, several hour compute times does :)

Honestly I'd read over the GPU Gems 2 chapter. Generating AO data using the GPU/rasterization is quite straightforward, efficient and accurate.

Share this post


Link to post
Share on other sites

What we're doing is quite different. Strictly speaking, it's not even AO and certainly not the kind of AO you see so much of these days.

We need to sample a complete hemisphere and the only way (I can think of) to do that is with half a cube map as I mentioned above. Also we need very high levels of multisampling. The occlusion is calculated (with MS) for every pixel of very large maps and models are several million polygons. If you stick to the requirements I gave, some of our maps would take over 300 billion renders of several million polys.

It's the specific number of "rays" in Melody that leads me to believe that it uses actual rays, you can't achieve any number of well distributed pixels with an image, but of course they could just be lying about the number...

Hope that clarifies things a litle.

Share this post


Link to post
Share on other sites
Maybe I'm not being clear...

Quote:
Original post by Madoc
We need to sample a complete hemisphere and the only way (I can think of) to do that is with half a cube map as I mentioned above.

No, the shadow maps method does exactly that. Indeed you can take as many samples as you want from the whole hemisphere/sphere just by rendering more shadow maps and accumulating the visibility results. It's the same thing as you're doing with shooting rays *from* the surface, except in the opposite direction. Indeed it's exactly the same as how shadow mapping vs. shadow rays works normally. Seriously, check out the GPU Gems 2 chapter as it makes this all a lot more clear with diagrams and the like.

Quote:
Original post by Madoc
Also we need very high levels of multisampling. The occlusion is calculated (with MS) for every pixel of very large maps and models are several million polygons.

So render tons of shadow maps with very high resolutions... it will still be fast.

Quote:
Original post by Madoc
If you stick to the requirements I gave, some of our maps would take over 300 billion renders of several million polys.

I think you're missing something here: a single shadow map effectively computes one ray for *every* texel in your scene. Thus if you're shooting 75 rays per texel, you only need 75 shadow map passes, each of which may use a very large shadow map (or split it into tiles if necessary). You seem to think that you need one shadow map per texel per ray, which is totally untrue and indeed just a really inefficient way of having the rasterizer compute a single ray intersection. The key point is that the shadow rays are *coherent*, and thus the rasterizer can compute many of them in parallel.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!