BSP Occlusion Tree. Good idea?

Started by
6 comments, last by jswill100 16 years, 10 months ago
Hello everyone! This is my first post here on gamedev.net. I had a general query that I was hoping you fine people with far more expertise might be able to assist with. Im currently working with a large scene I have created as part of my uni course of an old medieval church/alms house and its grounds. It consists probably about 50/50 of indoor and outdoor environments and there is pretty dense geometry in both parts. It is also an entirely static scene. My task is to basically apply as many acceleration techniques as I deem nessecery to get a good and constant FPS out of it for a real time user walkthrough. I am generally planning to have frustum, general occlusion, possibly cells and portals for the interior and LOD work as well as scene fog come the end. Right now I have gone for a polygon aligned BSP tree which is working correctly, and I am using to do proper back to front rendering (eliminating the z buffer) and view frustum culling based on bounding spheres generated at nodes (although I am thinking of using bounding boxes for a tighter fit but I havent measured the trade off in using intersection tests with boxes -feedback welcome). Now I am moving onto a general occlusion culling algorithm. I considered cells and portals for the interior (and may eventually do this) but frankly there arent that many rooms in each seperate building or doors/coridoors and the rooms are large and can often see most of the other rooms at one time. Hence I am thinking I will invest quite heavily in a good general occlusion culling method. After a bit of research, and taking the makeup of my scene into account (lots of big occluders are available with plenty of walls going on), I have come to the conclusion that using Occlusion Trees with merged occluder volumes (ala Bittner el al, Hierarchical Visibility Culling with Occlusion Trees, similar to shadow volume BSP trees). I suppose my general question is, do you guys think this is a good fit for what I am doing, and do you think it will give me a generous speed up? I suppose my research has not been quite as extensive as it shouldbe just yet, and I appreciate that this method was outlined quite some years ago. Do you think there is anything that better fits my situation (and can work off the back of my already completed BSP work?) Thanks John Williams [Edited by - jswill100 on June 8, 2007 12:23:01 PM]
Advertisement
Quote:Original post by jswill100
Right now I have gone for a polygon aligned BSP tree which is working correctly, and I am using to do proper back to front rendering (eliminating the z buffer) and view frustum culling based on bounding spheres generated at nodes (although I am thinking of using bounding boxes for a tighter fit but I havent measured the trade off in using intersection tests with boxes -feedback welcome).

That's not a good idea (the part in bold). If anything, you want to be rendering the scene from front to back for optimal performance. The Z-buffer isn't something you should be avoiding. :)

As far as your question is concerned, there's no telling what kind of performance gains (or losses) you'll experience without carefully profiling and analyzing the application. It might be the case that you don't even need any fancy occlusion algorithms beyond frustum culling or bounding spheres if already you get a good framerate. Chances are you'll be CPU-bound anyway, so the first thing I would do to improve performance is batch as much stuff as possible into as few draw calls and state changes as reasonably possible. If that isn't fast enough, then it would be a good time to look into more complex occlusion algorithms.
Quote:Right now I have gone for a polygon aligned BSP tree which is working correctly, and I am using to do proper back to front rendering (eliminating the z buffer)

that is impossible in a general case, as u can have polyons that are parts behind some but yet in front of others (um not explained good, ok a single polygon can span a huge region of the z-buffer).
in fact nowadays its recommended for speed to draw from front to back.

Quote:nd do you think it will give me a generous speed up?
no
but more important for u is its the wrong field to study.
if u wanta field to study there are more worthwhile ones then occlusion stuff (the problem is todays cards can do X num of 100's millions of tris per second, there is no need for reduction, ok their is the possible benifit of less state changes but thats not garanteed )
Welcome John,

Quote:Original post by zedz

Quote:and do you think it will give me a generous speed up?
no
but more important for u is its the wrong field to study.
if u wanta field to study there are more worthwhile ones then occlusion stuff (the problem is todays cards can do X num of 100's millions of tris per second, there is no need for reduction, ok their is the possible benefit of less state changes but thats not garanteed )


This is not correct. Although the processing power of todays graphics cards is simply amazing, the need for optimization will not go away anytime soon. We people feel this strong urge to push everything (including visuals) to the edges and with it comes the need for optimization and more powerful hardware.

As Zipster already pointed out, profiling is the key. In a multiprocessor pipelined architecture, as is the trend in current hardware, optimization is an iterative process of finding the bottleneck first and then moving it to another part by applying optimizations to that specific area where the bottleneck resides. Each iteration would gain you some speed up. Optimizing the wrong place would not gain you any performance at the very best. In the worst case, it might actually decrease the overall performance. As pointed out, if your application is already CPU-bound, chances are you actually loose performance by implementing an occlusion culling algorithm and thus increasing the pressure on CPU. My advice is to first locate the bottleneck exactly, and then try to search for different methods of optimizing it.
Thanks for the replys guys.

How does one tell if they are CPU bound?

To give you an idea of my current situation, my entire scene has about 5 million triangles, and with frustum culling only, in any one particular spot unless I am looking directly away from any walls, we are talking about 2 FPS. There can easily be 100k triangles wanting to be drawn at one time with only frustum, look at the scene the wrong way and you could easily get a up to 3 or 4 million in your view (its kind of long and thin so you could look at it from one end). I am also using an extremely large frustum to simulate an infinite view but eventually will use a smaller one + fog so that should help some.

This is the reason I am kind of assuming that some kind of occlusion is going to give me some huge improvements, the density of geometry behind my walls (both exterior and interior walls). if I picked the walls as occluders I can see instantly that at any one time 60-95%+ (95% is thinking of one end looking down length) of the most dense geometry might be hidden.


So I guess it just follows in my head that there isnt much of an issue or where the bottleneck is, just that Im displaying far, far too many polygons at any one time in the frustum and if I could be throwing away anything from 50k-3million triangles easily by using a couple of trees, then it would seem sensible. Apart from this I am not overly experienced in tracking down specific bottlenecks in applications.

I dont take the point too well that there isnt any need for occlusion or its future becuase of the speed of graphics cards. Data sets are getting increasingly large and the need for culling & related acceleration techniques is hardly going away, we're still a long way off from having everything we need in hardware and certain bespoke coding will, I would have thought always be needed.

Please correct me if im well off base on any of above. As ya can probably tell im quite new to all this. Have studied it quite a bit but in terms of practical experience I come up very short.

John
NVPerfHUD from nVidia is a very powerful tool if you want to analyze your application and find its bottlenecks. You may download it from here. You need one of GeForce 6 series cards or above to fully utilize its power.

But apart from that, rendering that many faces with an acceptable frame rate requires a great deal of care. Asides from some culling algorithm that you are to implement, you need to be extra careful with batching. Rendering even 100k polygons with poor batching can easily bring even the most powerful graphics cards to their knees; not because they are not able to process and render that amount of polygons per se, but more because there is no time left for the CPU to submit those batches. The graphics card would starve to death. "Batch, Batch, Batch, what does it really mean?" by Matthias Wloka, is a great article to look into, if you're looking for more theoretical insight.

Another way to decrease the amount of polygons in a scene is to break the scene into non-overlapping sectors which can be loaded into memory upon entry. This is an easy way of separating outdoor from indoor environments. When looked from outside, a building is nothing more than an empty box (with whatever details you want to add onto the surface), but as soon as the camera is about to enter the building through a door, indoor contents and art assets can be loaded into memory behind a loading screen. You sure have seen this in games. This both decrease memory usage and the amount of processing power needed to cull those extra geometries. This also makes it easier to implement different indoor/outdoor culling algorithms, as some are better suited for one or the other type of environment. Some flexibility would be lost though, but this is the de facto standard after all, flexibility vs. processing power.

One thing to check on is to make sure that you're making correct use of the 3D API. Poor API usage is a good way to make yourself CPU-limited. For example, instead of using immediate mode rendering, you should make sure that you're packing your geometry into vertex and index buffers (or whatever the OGL word is, I'm a D3D user myself :) ). You should also batch geometry together by render state to avoid redundant state changes.

The job of the CPU is to send the geometry to the GPU in batches, using as few batches as possible. A "batch" means one or more rendering calls with the same render state (with no state changes in between). If you aren't minimizing batches, then you are not using the graphics hardware to its full potential.

Using a tree structure to cull batches is probably a good idea, but to avoid excessive CPU overhead, you should only cull at a very coarse level (in batches of a thousand or so triangles), and you should make sure that this culling does not increase the number of render state changes.

As others have said, sorting geometry back to front is a very bad idea, and unless you've got a ton of overdraw or expensive pixel shading, sorting front to back may not be worth the effort either (it all depends where your bottlenecks are). However, if you can get it for free out of your tree structure then there's no reason not to try it.

100K polygons at a time is not a whole lot, by current standards. What kind of graphis hardware do you need to run on? And what kind of pixel processing is happening?
Joshua Barczak3D Application Research GroupAMD
Thanks guys, this has definately given me some things to think about.

I am using opengl, and I wasnt even aware of the benefits of using vertex/index buffers (I knew about display lists but you cant alter anything in a display list) and having a brief look they seem like something that could help alot, so given time I will definately incorporate them. Batching again was something I didnt know about but will certainly do some reading and restructure my code to fit with its principles.

For the moment I am going to work on some functionality of my occlusion tree (I was feeling a bit confused after reading up on them but they are actually really simple! almost have it fully up and running in a very coarse manner) and some LOD work (im on a course and my eventual marks are kind of based on good selection and implementation of these kind of techniques, as well as the overall resulting FPS etc). Its great to know though that there is plenty of optimizing I can do to my program at large after this.

NVPerfHUD - I have had a quick look and its looks interesting and I would certainly love to get to grips with a program that could help me indentify bottlenecks well since I'll definately be needing that in the future. Im running off my laptop with an ATI mobility X1600. If your not NVIDIA is it worth using or is there some alternative I might consider?

I dont suppose anyone knows of some decent resources for learning to implement good batching? I havent yet checked out the "Batch, Batch, Batch, what does it really mean?" article yet and will do, just a bit snowed under with me work for a few days.

Thanks guys. Any more general feedback is welcome :) This forum is awesome! glad I have found it.

EDIT: I realise I was being dumb - all of my alterations (polygon splitting) take place when I build my BSP tree, so I can shove everything in a displaylist after that because it doesnt change. Huge, huge huge difference in my FPS just from that.

John

[Edited by - jswill100 on June 10, 2007 9:31:42 AM]

This topic is closed to new replies.

Advertisement