Indoor rendering

Started by
44 comments, last by bwhiting 12 years, 5 months ago

@ Krypt0n - i liked the second idea, can i ask you to help out with the details of this approach?
you create a stack of frustums e.g.

std::vector<frustum>

while frustum is the usual 4 planes for left/right/top/bottom and near/far plane.

the first entry is the frustum based on your camera, when you 'see' a portal, you add the portal to the frustum stack/vector, using the 4 portal sides to determine left/right/top/bottom and the portal plane as near plane (this can be of advantage on more complex rooms. ), when you use anti-portals, you can handle them just like portals, but in the end, you flip the plane-normals of the frustum (left/right/top/bottom and near.)

when you check an object, you check against every frustum in the vector. adding a new room, you add an additional entry to the vector, returning from the room, you remove that entry. (quite a usual recursion).

if no frustum culls your object, you can add it to your rendering.











1- batch as much as you can in order to limit the amount of draw calls
2- limit GPU cycles spent on a per pixel basis


for outdoor, you are usually limited by 1. as you have a big view range, seeing tons of objects, not really important if they have perfect shading, in addition most parts are lit just by the sun, and usually you have a big part of the rendering covered by 'sky'.

for indoor rendering, you will rather be limited by 2, you cannot stick in 5000 individual drawcalls into the level in every room, as that would mean you have about 200pixel per object, or 16x12. it would be like creating your level of tiny bricks..


so, while you are right bout those two limitations, it's very context dependent, what you need to optimize for.


The theory is pretty obvious: by using an acceleration structure you limit your draw calls by rendering only the potentially visible (polygon) set. At the same time, when using occlusion culling, you inderectly limit the GPU cycles by reducing overdraw.[/quote] that's how it was before the year 2000, since about the geforce256 (geforce 1), we stopped touching individual polygones, it's just way faster to push a whole 'maybe' hidden objects, than updating thousands of polys on cpu side. per pixel graphics was anyway too slow at that time to do anything fancy (even one fullscreen bumpmapping effect was dropping your framerate <30).




Problem: you have 100 objects to render, not cloned/instanced, each made up by 2 materials, one shared and one being choosen from a set of 8 different materials.

Worst case scenario: selecting material 1, render, select material 2, render, switch to next object.
200 draw calls and 400 material (texture/shader) switches.

Second solution: select material 1, render each object, select material 2, render each object using that material, then switch to next material
200 draw calls and 9 material (texture/shader) switches.

Now, if we use an acceleration structure like a BSP or an Octree, we are actually splitting objects, introducing more polygons. So if an object gets splitted that implies we have two different objects, thus increasing the total object count (and draw calls). On the other hand some acceleration structures can reduce overdrawing so this might still be a winner.

What I ask is if I merged all the polygons sharing the same material and I took advantage of a z-pass (ore use a deferred renderer) what kind of performance would I get?

Even if I didn't create a supermerged object for the z-pass I would be able to issue:
9 draw calls to get the z-pass.
9 draw calls for the actual rendering, with 0 overdrawing (guaranteed by the z-pass).

I can submit 18 draw calls compared to 200+ and I'm 100% sure there's no overdraw at all... and as for rendering more polygons, polygon count usually isn't a big problem in 2011... or at least it's not so limiting like shading.

In which way any acceleration structure can render something faster than that?
[/quote]

the problem with optimzations for a theoretical situation is that you cannot know what would help or make it worse. 'optimizing' is just exploiting a special context, it's trying to find cheats that nobody will notice, at least not in visual artifacts. while your ideas are valid, they might not change the framerate at all in real world, they might make it faster just like it all might become slower.

I think, if I had 200+ drawcalls for an indoor scene, I'd probably not care about drawcalls at all. if I'am drawcall-bound with just 200, there must do something seriously wrong.

considering this, the situation is way simpler, you might observe, that the geometry is not your problem, neither is the actual surface shading, you will be probably limited by lighting and by other deferred passes (e.g. fog, decals, postprocessing like motion blur).

so, it might be smart to go deferred like you said, you dont need a zpass for that, you probably do best with simple portal culling in combination with scissor rects and depthbound checks.

now all you want to optimize is to find the perfect area your lights have to touch, to modify as few pixel as possible and you need to solve a problem, (nearly) completely unrelated to BSP/Portals/PVS/etc.

you might want to

- portal cull lights and/or occlusion culling

-depth carving using some light volumes (similar to doom 3's shadow volumes)

-fusing deferred + light indexed similar to frostbite 2

-reducing resolution like most console games do, maybe just for special cases e.g. distant lights, particles, motion blur (e.g. check out the UDK wiki), with some smart upsampling

-you might want to try scissor and depthbound culling per light
-you might want to decrease quality based on frame time (e.g. less samples for your SSAO)


-you might want to add special 'classification' for lights, to decide how to render which type of light, under what conditions, with which optimizations, e.g. it might make sense to batch a lot of tiny lights into one drawcall, handling them like a particle system, it might make sense to do depth carving just on near-by lights, as distant lights might be fully limited by that carving and a simple light-objects would do the job already).




what I want to basically show is, that your scene handling is nowadays not as big deal as it was 10 years ago. you still dont want to waste processing power, of course, but you won't implement a bsp system to get a perfect geometry set, I've even stopped to use portal culling nowadays. it's rather important to have a very stable system, that is flexible and doesn't need much maintaining, while giving good 'pre-processing' for the actual expensive stage nowadays, which is the rendering. (as an example, "resistance" used just some kind of grid with PVS, no portals, bsp etc.)

and like you said, there are two points, drawcalls and fillrate, you deal with them mostly after the pre-processing (culling). you have a bunch of drawcalls and you need to organize them in the most optimal way you can, not just sorting or batching, as for indoor you'll spend probably 10% of your time to generate a g-buffer, you'll have 10%-30% creating shadow maps, the majority of the frame time will be spend on lighting and post processing effects.







Advertisement
@Krypt0n - great explanation, ++ !




Had a small break today, now going back to work - looking now at ' frustum stack ' solution. Will update as usual.

perfection.is.the.key
@xynapse I think your video looks great! Having the room behind the portal pop on/off when the portal is onscreen/offscreen is a great example of portals working properly.

From your previous posts, it seems like there won't be many rooms visible at once; you can see through a portal or two, but the 'room depth' won't be very high right? If that's the case, you can probably brute-force draw each room that's visible. State-sorting helps, but you might not need it. You should keep it flexible so you can add if needed, but any complexity you can avoid (especially at the prototype stage) is great!

If there's a possibility for a deep traversal into many rooms, distant rooms should be drawn at a lower LOD, if you have different quality levels available. I think you said you're exporting the meshes from some 'real' (by that I mean not-homemade) tool, so it should be easy to have 2 or 3 different quality meshes for each room.


@DracoLacertae - You're right mate saying that we won't have more than 2/3 rooms visible at once - this is rare stuff, especially that the camera is to be placed statically ( as per old style Alone in the dark ).

As you said not to over complicate the things - we are not going to LOD the sectors due to above, so we're basically done with what we have in terms of PVS and scene management.
The last cool thing to have is to make a frustum stack as per Krypt0n description - and that's pretty all for now.

Thanks to all of you guys for letting me drop in sometimes and pass news for discussion - great stuff also i did not go the BSP way - as this is a real headache with little results for a project that does indoor - but doesn't go into 'massive' rendering.


Thanks again and stay in touch! I'll be updating with news and other things to discuss.







perfection.is.the.key

for outdoor, you are usually limited by 1. as you have a big view range, seeing tons of objects, not really important if they have perfect shading, in addition most parts are lit just by the sun, and usually you have a big part of the rendering covered by 'sky'.

for indoor rendering, you will rather be limited by 2, you cannot stick in 5000 individual drawcalls into the level in every room, as that would mean you have about 200pixel per object, or 16x12. it would be like creating your level of tiny bricks..


so, while you are right bout those two limitations, it's very context dependent, what you need to optimize for.

I agree, I was posting that as general optimization rules.


that's how it was before the year 2000, since about the geforce256 (geforce 1), we stopped touching individual polygones, it's just way faster to push a whole 'maybe' hidden objects, than updating thousands of polys on cpu side. per pixel graphics was anyway too slow at that time to do anything fancy (even one fullscreen bumpmapping effect was dropping your framerate <30).
[/quote]
Exactly, rendering potentially hidden geometry is faster on modern GPUs.


the problem with optimzations for a theoretical situation is that you cannot know what would help or make it worse. 'optimizing' is just exploiting a special context, it's trying to find cheats that nobody will notice, at least not in visual artifacts. while your ideas are valid, they might not change the framerate at all in real world, they might make it faster just like it all might become slower.

I think, if I had 200+ drawcalls for an indoor scene, I'd probably not care about drawcalls at all. if I'am drawcall-bound with just 200, there must do something seriously wrong.
[/quote]
Well, the problem isn't 200 drawcalls are limiting, my point is if in a simple scenario like that there's a solution to submit 10% of the drawcalls it just looks like a good solution.


considering this, the situation is way simpler, you might observe, that the geometry is not your problem, neither is the actual surface shading, you will be probably limited by lighting and by other deferred passes (e.g. fog, decals, postprocessing like motion blur).

so, it might be smart to go deferred like you said, you dont need a zpass for that, you probably do best with simple portal culling in combination with scissor rects and depthbound checks.

now all you want to optimize is to find the perfect area your lights have to touch, to modify as few pixel as possible and you need to solve a problem, (nearly) completely unrelated to BSP/Portals/PVS/etc.

you might want to

- portal cull lights and/or occlusion culling

-depth carving using some light volumes (similar to doom 3's shadow volumes)

-fusing deferred + light indexed similar to frostbite 2

-reducing resolution like most console games do, maybe just for special cases e.g. distant lights, particles, motion blur (e.g. check out the UDK wiki), with some smart upsampling

-you might want to try scissor and depthbound culling per light
-you might want to decrease quality based on frame time (e.g. less samples for your SSAO)


-you might want to add special 'classification' for lights, to decide how to render which type of light, under what conditions, with which optimizations, e.g. it might make sense to batch a lot of tiny lights into one drawcall, handling them like a particle system, it might make sense to do depth carving just on near-by lights, as distant lights might be fully limited by that carving and a simple light-objects would do the job already).

what I want to basically show is, that your scene handling is nowadays not as big deal as it was 10 years ago. you still dont want to waste processing power, of course, but you won't implement a bsp system to get a perfect geometry set, I've even stopped to use portal culling nowadays. it's rather important to have a very stable system, that is flexible and doesn't need much maintaining, while giving good 'pre-processing' for the actual expensive stage nowadays, which is the rendering. (as an example, "resistance" used just some kind of grid with PVS, no portals, bsp etc.)

and like you said, there are two points, drawcalls and fillrate, you deal with them mostly after the pre-processing (culling). you have a bunch of drawcalls and you need to organize them in the most optimal way you can, not just sorting or batching, as for indoor you'll spend probably 10% of your time to generate a g-buffer, you'll have 10%-30% creating shadow maps, the majority of the frame time will be spend on lighting and post processing effects.
[/quote]
Yes that was exactly my point. To my experience static (and opaque) geometry is just submitted to the g-buffer (I go deferred) with no spatial structure traversing (I have the chance to generate an octree if needed but submitting geometry turns out to be usually faster). Then all optimizations are about dynamic objects and lights/shadows. Also spent some time optimizing shadows when it comes down to static and dynamic geometry, shadowmap resolution, distance, etc.

And since all my shaders are assembled and generated on the fly according to what effect each material requires, I can also generate simpler shaders if there's not enough horsepower available.

The only reason why I still use a spatial structure is for scenes heavily using transparent static objects. I use a BSP but that was a very specific scenario in which I had to use the engine to render a real world building which was 70% glass, with different colors and opacity levels. In that case I needed a perfect geometry set and a perfect sorting so I went for a BSP. Of course it's a performance killer but I couldn't come up with a better solution at that time.
Back with the updates as promised.


So far I've been testing how to render one sector with a single draw call without forcing the engine to sort the textures by their type and render faces per texture.

Finally i have decided to go on and build an 2d array of textures that holds all textures within the level and each texture can be accessed via shader by value .z of the texcoord,


so instead of passing Vec2 to the shader as a Texture Coordinate, i pass Vec3 with .Z being an id to the texture the vertex is using.




Works like a charm,

indoor9.jpg

indoor10.jpg




I know some people can be interested of how things work here, here are some things you should remember with this approach:

- Checkout the max array size for GL_TEXTURE2D_ARRAY for the gpu you're working with ( i've got a 512 limit on SLI 9800GTX )

- Remember all your textures have to be exactly the same size

- Assign texture id per vertex and store it in TextureCoord, instead of Vec2 use Vec3

- Pick the right texture in fragment shader like this:



#version 130
#extension GL_EXT_gpu_shader4 : enable

uniform sampler2DArray textureName;

in vec3 vTexCoord;
out vec4 FragColor;

void main()
{

FragColor = texture2DArray(textureName, vTexCoord.xyz) ;

}






Here, you can see it in action

[media]
[/media]




The idea works fast as hell, and designers have an ability to do texturing per face ( as requested ).

I can't think of any problem as long as you

- control the real size of your textures

- and don't forget about that you're limited in 2d arrays.




Stay in touch!




perfection.is.the.key

So far I've been testing how to render one sector with a single draw call without forcing the engine to sort the textures by their type and render faces per texture.

Finally i have decided to go on and build an 2d array of textures that holds all textures within the level and each texture can be accessed via shader by value .z of the texcoord,


so instead of passing Vec2 to the shader as a Texture Coordinate, i pass Vec3 with .Z being an id to the texture the vertex is using.

Works like a charm,


you know, when we see an optimization, we wonder, how much fast is it now?

[quote name='xynapse' timestamp='1320743913' post='4881679']
So far I've been testing how to render one sector with a single draw call without forcing the engine to sort the textures by their type and render faces per texture.

Finally i have decided to go on and build an 2d array of textures that holds all textures within the level and each texture can be accessed via shader by value .z of the texcoord,


so instead of passing Vec2 to the shader as a Texture Coordinate, i pass Vec3 with .Z being an id to the texture the vertex is using.

Works like a charm,


you know, when we see an optimization, we wonder, how much fast is it now?
[/quote]

Krypt0n, sorry for a late reply - i was out for few days in business travel.


The thing is that this approach does not require us to bind textures per face / do sorting per texture.

It is enough to have an array of textures consisting of:






0 [colormap]

1 [normalmap]


2 [heightmap]

3 [colormap]


4 [normalmap]

5 [heightmap]






Every face consist of 3 vertexes, so

each vertex has vector3 texcoord (x,y,z) instead of vector2 (u,v)

the Z component for this texcoord is an index to the texture array - so in other words, this index is a texture we want this face to be textured with.

Well i can't be wrong saying this works ultra-fast, we're having more than 32 textures now in one sector and there is NO frame drop at all..




Additionaly, going further - it is v. easy to pick the right texture within the shader and do bump, normal and other types of mapping - and that all comes with a single

BindTexture call instead of switching / sorting / etc... - probably it's up to the engine requirements, but this time it fitted sooooo well :)
perfection.is.the.key
The time has come to implement the collision into the engine ( i know it has nothing to do with rendering, but it is just another episode that i need to finish to move on to lighting )




I need to make it general - so that any other object around can do the collision tests easily.

If i will have this implemented, it should be easy to pass this method to player moving around.




I am thinking of any theory that stands behind this.

Knowing that

Sector ( the room ) is a regular object

And each object is made of faces i can do a Ray->Triangle(face) hit test - that works, but would that be enough?






Let's see, this is the actual scene view - it shows a wireframe bounding boxes for the sector, object inside and a light source - doesn't matter in this case.

indoor11.jpg




Now i've shot a Ray into the 'wall' ( 2 faces ) on the right side - collision has been found, and i just for debug purposes made this wall to move a bit




indoor12.jpg




Looks good for a Ray -> Triangle collision, but how should i go when thinking of colliding camera, player, object with others?


As everything in this engine is considered as an object - how do i approach this collision model ?

perfection.is.the.key

[quote name='Krypt0n' timestamp='1320769533' post='4881797']
[quote name='xynapse' timestamp='1320743913' post='4881679']
So far I've been testing how to render one sector with a single draw call without forcing the engine to sort the textures by their type and render faces per texture.

Finally i have decided to go on and build an 2d array of textures that holds all textures within the level and each texture can be accessed via shader by value .z of the texcoord,


so instead of passing Vec2 to the shader as a Texture Coordinate, i pass Vec3 with .Z being an id to the texture the vertex is using.

Works like a charm,


you know, when we see an optimization, we wonder, how much fast is it now?
[/quote]

Krypt0n, sorry for a late reply - i was out for few days in business travel.


The thing is that this approach does not require us to bind textures per face / do sorting per texture.

It is enough to have an array of textures consisting of:






0 [colormap]

1 [normalmap]


2 [heightmap]

3 [colormap]


4 [normalmap]

5 [heightmap]






Every face consist of 3 vertexes, so

each vertex has vector3 texcoord (x,y,z) instead of vector2 (u,v)

the Z component for this texcoord is an index to the texture array - so in other words, this index is a texture we want this face to be textured with.


Additionaly, going further - it is v. easy to pick the right texture within the shader and do bump, normal and other types of mapping - and that all comes with a single

BindTexture call instead of switching / sorting / etc... - probably it's up to the engine requirements, but this time it fitted sooooo well :)
[/quote] I rather wanted to hint you that you're doing a premature optimization. you traded some texture setting _per_drawcall_ for _per_pixel_ texture selection, but you don't know if it's faster or slower because you did not even measure?

Well i can't be wrong saying this works ultra-fast, we're having more than 32 textures now in one sector and there is NO frame drop at all..[/quote]you know, if you one day figure out you have framerate problems, people will probably suggest you to optimize by moving pixel cost to the vertex shader or the cpu, reverse of your optimization :wink:




that's no offense, it's for sure an elegant and interesting way to handle textures, but it looks like a random (and for now unneeded) optimization.

This topic is closed to new replies.

Advertisement