Sign in to follow this  

Instancing render strategy: altering renderbuffer or throwing all at card?

Recommended Posts

I am currently working on instancing and I have a problem with sorting the rendering.

If I choose to render only the visible instances, I have to open the instancing render buffer and upload the instancing data for the visible instances. The problem is just, that for each shadowmap this has to be done again, then for reflections, etc. A hell of a lot of locking the render buffer, uploading data and then rendering - every single frame.

I thought about it, and wondered if it is even worth it. Can I not just throw everything at the card? What kind of tradeoffs would each strategy entail?
I recon, that it also depends on the number of instances that are not visible, but what amounts are we talking about before it is no longer feasible to not optimize?

I have an object oriented system, that the rendering works in, and if I have to sort, then we are also talking a number of virtual calls before the hardware-specific call (DirectX 9 in my case) is reached, as the platform-specificness has been abstracted away. These calls (calls to 3 virtual-hierarchies for each setting of renderbuffer, not too much) may also be a factor in evaluating the feasibility, though Im unsure if the factor is so small, that it can be ignored - it is every single frame, after all. Edited by QNAN

Share this post

Link to post
Share on other sites
You could use multiple instancing buffers. One for each light for the shadow map generation, one for the scene rendering. Then you can lock them ALL, iterate through the scene and add all instances to their respective buffers as required, then unlock all and start using them to generate shadow maps and render the scene. By doing the lock/unlocking concurrently (well, interleaved), you reduce the amount of time the CPU has to wait for the buffers to be free

compare the following two strategies, given that you have 2 shadow mapped lights (makes 3 instance buffers used in total)


lock | iterate scene/copy instance data | unlock | render shadow map 1 | lock | iterate scene | unlock | render shadow map 2 | lock | iterate scene | unlock | render scene

lock x3 | iterate scene/copy instance data | unlock x3 | render shadow maps 1-n | render scene


(I assume that by the instance buffer you mean the vertex buffer that is bound to slot 1 and contains a Matrix per instance, or float3(pos)+float(uniform scale)+float(quat rotation) if you're really fancy)

Share this post

Link to post
Share on other sites
That is a very nice idea Husilardee.
The biggest problem I see is, that in an openended game, you will not know ahead how many shadows/reflections/cameras/etc. are in the vicinity of the rendered area. I guess you could create an amount of instance buffers (yes, I mean vertexbuffer with instance data) equal to a maximum possible, that you specify...

Another problem is that I would then have to tell my object how many cameras it is observed by, where it is at the moment completely oblivious of this (and logically should be IMO).

I like the idea though, and I will see if I can incorporate it in some way - I will have to do some thinking to get that into my system :)

Do you have any bid on what the cost would be if I just brute-force, throwing all instances at the card, vs. sorting? How many wasted instances (or triangles) are we talking before it is no longer feasible to not optimize?

Share this post

Link to post
Share on other sites
I really really don't recommend throwing everything at the card. I might point out that in any game scenario there's a limit to however many shadow maps you're going to be using in one frame anyway, because the expense of generating more than 4 shadow maps per frame, then over 4 texture comparisons when shading the scene, becomes prohibitive.

I would really create as many instance buffers as you need. Let's say you want 30 different meshes to be instance-able. (5 tree variations, 10 rock variations, 5 grass variations, 10 floor clutter variations, random props eg crates, barrels, etc etc).
Each instance requires one float3x3 for a rotation*scale, float3 for a position, float3 for a diffuse color, and 2 more arbitrary float parameters which could be used for different things, makes 16 floats which fits nicely). The total amount of data is 64 bytes per instance
You want 1000 maximum instances in the level, and you limit the number of active shadow-mappable lights to 4. This means each mesh needs 5 instance buffers. So that's
30x5x1000x64 bytes = about 9 mb of VRAM. Not a huge amount, and will be eclipsed by the shadow maps themselves. (4x1024x1024x32bpp = 16 mb)

Objects don't need to know how many cameras they are visible to. The scene can do frustum culling for all objects and lights at the same time, and write the instance buffers as necessary.

Let's take a good example from the game du jour - Slender. If you haven't played it, it's a horror game where the player walks around in first person view in a forest at night with a torch. Now obviously the torch is represented as a spot-light, which has an associated shadow map. (I think it's the only light source in the game as well as a very small amount of ambient light). Now let's say there are 100 trees in the game level. About 20 are visible to the player at any time, and 5 fall within the torch's beam. You tell me how much of a waste it is to throw everything at the card.

This is the approach I'm using anyway.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this