Sign in to follow this  

Yann's ABT/HOM, still using software queries or hardware now?

This topic is 3310 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, In this thread http://www.gamedev.net/community/forums/topic.asp?topic_id=448288&whichpage=1� Yann talked about how he designed the ABT tree to be used in conjunction with an occlusion culling system, then he said "(software at the time, hardware nowadays)". My question is I remember reading the old posts by Yann about how he ended up not using hardware queries because they were slower or introduced problems, so he created a software rasterizer. I'm just wondering from the above thread if hardware queries are the way to go today, or if software HOM maps are still a good way to go. I was looking at writing a software rasterizer but if Yann isn't doing that anymore I'd really be interested in why. Thanks for any info!

Share this post


Link to post
Share on other sites
Well, I guess I'll answer this one ;)

Actually, I moved from software to hardware some time ago, and back to software rather recently. While hardware queries have become much faster, and conditional rendering is now possible (ie. no readback of the results), they still come with a number of significant problems.

In a nutshell, my reasons to go back to software were mainly the following:

* The GPU is already highly stressed in my engine, especially on the fillrate. Rendering the queries takes additional fillrate. Especially on todays HDTV resolution screens (since the query is rendered at screen resolution).

* Hardware queries can give strange and unexpected results when performed on a multisampled depth target.

* My engine is usually very VRAM limited, so I stream geometry and textures on the fly. Doing conditional rendering still requires that the data of the entire scene is always in VRAM, even if it's never rendered in the end. That's not acceptable for me.

* Hardware occlusion without conditional rendering still requires a readback, which brings us back to the bubble problem. Deferring the queries is not always possible or practical.

On the other hand, CPUs have gone multicore. And a software renderer is highly threadable. I have written a highly optimized SIMD renderer that runs on multiple cores with lockless sync. On a single core CPU, the GPU hardware queries are faster. On a dualcore, it's around the same if the GPU is not fillrate limited, and goes in favour of the CPU version if the GPU gets on the fillrate limit. And on a quadcore, the CPU version is lightyears ahead of the GPU queries.

YMMV, because it depends on your engine architecture, the type of scenes you use and how/where you stress the GPU the most. So the best way is to test it for yourself.

Share this post


Link to post
Share on other sites
Heh, yeah I guess the question was pretty specific to you :) Thanks so much for the response, those points you made make a lot of sense. Curious, I know you mentioned that a query with readback introduces bubbles, what do you think about algorithms like Coherent Hierarchical Occlusion Culling, or CHC plus-plus that aim to reduce stalling etc?

I just barely stumbled across CHC plus-plus and they claim to have almost an optimal solution, that is pretty close to the "perfect" case where nothing that shouldn't be rendered is rendered. Although it seems that readback-based algorithms tend to try to do a lot of prediction which I don't know if I'm a fan of, CHC plus-plus seems to do pretty well in the general sense, except for transitions from highly differing parts of the scene if I recall correctly. I guess that even the readback algorithms would exhibit some of the problems from your list, like fillrate and gpu utilization.

A multi-threaded CPU renderer sounds really cool! I'd love to take a crack at a software renderer, and eventually make it SIMD and multi-threaded. Profiling our app with NVPerfHud shows we have a lot of fill-rate usage, so perhaps the software approach would work well.

Share this post


Link to post
Share on other sites
Thanks trs79. I hadn't noticed they had improved on their CHC method. If others want to read it as well, I found it here: http://www.cg.tuwien.ac.at/research/publications/2008/MATTAUSCH-2008-CHC/

Share this post


Link to post
Share on other sites
In that paper, the point seems to be batching queries together. This is something I considered with my implementation of CHC (which I abandoned as it was way -slower than my basic CPU approach).

What I don't think they do mention is how you can actually send a batch of hardware occlusion queries to the card. If you consider object A occludes object B and object B occludes object C, sending a query batch of A, B, C will surely not work. The query for A might return VisPixels > 0, but the query for B will have been executed on a buffer without A rendered on it meaning it will show as visible when it isn't.

This is something I couldn't get my head around with CHC and the other reason I binned it. Feel free to correct me if I'm wrong on the above.

Share this post


Link to post
Share on other sites

This topic is 3310 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this