you create a stack of frustums e.g.
@ Krypt0n - i liked the second idea, can i ask you to help out with the details of this approach?
std::vector<frustum>
while frustum is the usual 4 planes for left/right/top/bottom and near/far plane.
the first entry is the frustum based on your camera, when you 'see' a portal, you add the portal to the frustum stack/vector, using the 4 portal sides to determine left/right/top/bottom and the portal plane as near plane (this can be of advantage on more complex rooms. ), when you use anti-portals, you can handle them just like portals, but in the end, you flip the plane-normals of the frustum (left/right/top/bottom and near.)
when you check an object, you check against every frustum in the vector. adding a new room, you add an additional entry to the vector, returning from the room, you remove that entry. (quite a usual recursion).
if no frustum culls your object, you can add it to your rendering.
1- batch as much as you can in order to limit the amount of draw calls
2- limit GPU cycles spent on a per pixel basis
for outdoor, you are usually limited by 1. as you have a big view range, seeing tons of objects, not really important if they have perfect shading, in addition most parts are lit just by the sun, and usually you have a big part of the rendering covered by 'sky'.
for indoor rendering, you will rather be limited by 2, you cannot stick in 5000 individual drawcalls into the level in every room, as that would mean you have about 200pixel per object, or 16x12. it would be like creating your level of tiny bricks..
so, while you are right bout those two limitations, it's very context dependent, what you need to optimize for.
The theory is pretty obvious: by using an acceleration structure you limit your draw calls by rendering only the potentially visible (polygon) set. At the same time, when using occlusion culling, you inderectly limit the GPU cycles by reducing overdraw.[/quote] that's how it was before the year 2000, since about the geforce256 (geforce 1), we stopped touching individual polygones, it's just way faster to push a whole 'maybe' hidden objects, than updating thousands of polys on cpu side. per pixel graphics was anyway too slow at that time to do anything fancy (even one fullscreen bumpmapping effect was dropping your framerate <30).
Problem: you have 100 objects to render, not cloned/instanced, each made up by 2 materials, one shared and one being choosen from a set of 8 different materials.
Worst case scenario: selecting material 1, render, select material 2, render, switch to next object.
200 draw calls and 400 material (texture/shader) switches.
Second solution: select material 1, render each object, select material 2, render each object using that material, then switch to next material
200 draw calls and 9 material (texture/shader) switches.
Now, if we use an acceleration structure like a BSP or an Octree, we are actually splitting objects, introducing more polygons. So if an object gets splitted that implies we have two different objects, thus increasing the total object count (and draw calls). On the other hand some acceleration structures can reduce overdrawing so this might still be a winner.
What I ask is if I merged all the polygons sharing the same material and I took advantage of a z-pass (ore use a deferred renderer) what kind of performance would I get?
Even if I didn't create a supermerged object for the z-pass I would be able to issue:
9 draw calls to get the z-pass.
9 draw calls for the actual rendering, with 0 overdrawing (guaranteed by the z-pass).
I can submit 18 draw calls compared to 200+ and I'm 100% sure there's no overdraw at all... and as for rendering more polygons, polygon count usually isn't a big problem in 2011... or at least it's not so limiting like shading.
In which way any acceleration structure can render something faster than that?
[/quote]
the problem with optimzations for a theoretical situation is that you cannot know what would help or make it worse. 'optimizing' is just exploiting a special context, it's trying to find cheats that nobody will notice, at least not in visual artifacts. while your ideas are valid, they might not change the framerate at all in real world, they might make it faster just like it all might become slower.
I think, if I had 200+ drawcalls for an indoor scene, I'd probably not care about drawcalls at all. if I'am drawcall-bound with just 200, there must do something seriously wrong.
considering this, the situation is way simpler, you might observe, that the geometry is not your problem, neither is the actual surface shading, you will be probably limited by lighting and by other deferred passes (e.g. fog, decals, postprocessing like motion blur).
so, it might be smart to go deferred like you said, you dont need a zpass for that, you probably do best with simple portal culling in combination with scissor rects and depthbound checks.
now all you want to optimize is to find the perfect area your lights have to touch, to modify as few pixel as possible and you need to solve a problem, (nearly) completely unrelated to BSP/Portals/PVS/etc.
you might want to
- portal cull lights and/or occlusion culling
-depth carving using some light volumes (similar to doom 3's shadow volumes)
-fusing deferred + light indexed similar to frostbite 2
-reducing resolution like most console games do, maybe just for special cases e.g. distant lights, particles, motion blur (e.g. check out the UDK wiki), with some smart upsampling
-you might want to try scissor and depthbound culling per light
-you might want to decrease quality based on frame time (e.g. less samples for your SSAO)
-you might want to add special 'classification' for lights, to decide how to render which type of light, under what conditions, with which optimizations, e.g. it might make sense to batch a lot of tiny lights into one drawcall, handling them like a particle system, it might make sense to do depth carving just on near-by lights, as distant lights might be fully limited by that carving and a simple light-objects would do the job already).
what I want to basically show is, that your scene handling is nowadays not as big deal as it was 10 years ago. you still dont want to waste processing power, of course, but you won't implement a bsp system to get a perfect geometry set, I've even stopped to use portal culling nowadays. it's rather important to have a very stable system, that is flexible and doesn't need much maintaining, while giving good 'pre-processing' for the actual expensive stage nowadays, which is the rendering. (as an example, "resistance" used just some kind of grid with PVS, no portals, bsp etc.)
and like you said, there are two points, drawcalls and fillrate, you deal with them mostly after the pre-processing (culling). you have a bunch of drawcalls and you need to organize them in the most optimal way you can, not just sorting or batching, as for indoor you'll spend probably 10% of your time to generate a g-buffer, you'll have 10%-30% creating shadow maps, the majority of the frame time will be spend on lighting and post processing effects.