instanced drawing and frustum cull

Started by
9 comments, last by Norman Barrows 7 years, 2 months ago

instanced drawing requires a VB of worldmats, one worldmat for each instance.

if i want to draw a whole bunch of grass plants (10K to 100K instances), how do i deal with frustum cull?

lets say i want to draw a plant every foot in a 300 foot radius bounding box around the camera. that's 300x300= 90,000 instances.

do i frustum cull 90,000 plants, and for those that pass cull, i copy their worldmat into the vb? would that be a static vb created and discarded each time? or a dynamic one that gets reused?

or do i simply have a terrain chunk of grass plants, and if the chunk is visible, i draw them all with no frustum cull, and let viewport clipping sort it out?

i currently use terrain chunks 300 feet across (game scale: 1 foot = 1 d3d unit). one plant per foot will probably be the desired vegetation density. thats 90,000 instances per chunk. i draw 4 or 6 chunks each time, depending on direction of view. so that's 360,000 or 540,000 plants i'm talking about drawing.

and what about user defined clip range? (fade out distance?).

instanced drawing might be all fine and good, but the plants really still need to be culled on an individual basis, or at least in groups or chunks.

this is the old "efficient database" of renderables problem.

from: https://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers

Databases and Culling

Building a reliable database of the objects in your world is key to excellent performance in Direct3D. It is more important than improvements to rasterization or hardware.

You should maintain the lowest polygon count you can possibly manage. Design for a low polygon count by building low-polygon models from the start. Add polygons if you can do so without sacrificing performance later in the development process. Remember, the fastest polygons are the ones you don't draw.

any suggestions?

each frame, for each plant, range clip, frustum cull, calc worldmat, add to VB? then draw instanced? sounds slow. dynamic buffers. will the speed of draw instanced more than make up for that? try it and find out?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Advertisement

You can split chunks for smaller batches than 90k. But start with brute force frustum check per plant and see how slow it will be. Frustum culling will not only save GPU performance but everything related on rendering those plants. Animation, matrix concatenations, bandwith.

You can also use some sort of quad tree culling for fast and accurate frustum culling.

You can also use some sort of quad tree culling for fast and accurate frustum culling.

quadtree culling is just another form of "group culling". but it might be a possibility. the big problem is that terrain chunks must be generated on-the-fly in realtime from underlying map data. generating quadtrees in realtime doesn't sound highly doable.

i'm thinking: write a test routine where you can vary the size of a "group" from one plant all the way up to an entire terrain chunk, or perhaps all 9 chunks around the camera (810,000 plants in one group!).

then test two ways with various sized groups, including groups of one plant:

1. just culling groups. if the group is not culled, draw all plants in the group. no need to make a list of locations each frame. but some plants may end up offscreen.

2. cull groups, and cull plants in groups that pass cull, and create a VB of locations each frame.

i use static buffers for the generated ground meshes. i may be able to use static buffers for the plant worldmats too if i just draw entire groups and don't make a list.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

(a) using chunks and LoDs aggressively. Cull out whole batches of grass. DO NOT try to cull individual pieces of foliage - that way lies madness.

(b) generate the foliage on the GPU. https://github.com/mreinfurt/Grass.DirectX for example.

Sean Middleditch – Game Systems Engineer – Join my team!

(a) using chunks and LoDs aggressively. Cull out whole batches of grass. DO NOT try to cull individual pieces of foliage - that way lies madness.

Using frustum that is expanded by max radius of foliage piece to cull single pieces of foliage is quite cheap.(point vs frustum.) With some hierachy to not test every foliage but only pieces that belongs to chunks that intersect with frustum it might be fast enough.

GPU culling would be best choise.

For CPU particles I have noticed that frustum culling can be beneficial even per particle basis.

Every API supports a way to efficiently stream data from the CPU into a per-frame vertex buffer. Often in D3D terminology it's "map discard". Is this for D3D9? There should be an MSDN article with recommendations for dynamic vertex buffers in particular.

You can use this to stream the world-matrices / per-instance data, or, you can stream smaller index data (indices of the visible instances), which the VS can use to index into a static array of world-matrices / per-instance data.

A state of the art renderer could actually use a compute shader to compute the culling of these bounding shapes and generate this instance buffer entirely on the GPU :wink:

For an example of the feasibility of streaming CPU->GPU: In a recent game, we actually did a lot of "vertex shading" on the CPU and streamed over 100k 32byte verts per frame!
(that's around 200MB/s on a bus who's capacity is measured in GB/s)

DO NOT try to cull individual pieces of foliage - that way lies madness.

yes, i've been down that path before with non-instanced drawing. Forget drawcall overhead - its the culling that kills!

generate the foliage on the GPU

unfortunately I have yet to see generated grass that looks nearly as good as sprite textured grass (IE textured qauds with textures from photos or photorealistic renderings). shadowing etc is quite impressive, but the blades themselves.... "eh - so so" at best. So it looks like GPU generated solutions won't cut it. But i may be trying to achieve the impossible here anyway... So nothing is off the table as a possibility, its more a matter of what methods to consider first.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Using frustum that is expanded by max radius of foliage piece to cull single pieces of foliage is quite cheap.(point vs frustum.)

isn't bsphere vs plane just as cheap (maybe one more ADD) ? i can't recall....

one nice thing, when culling terrain chunks, you really only need to test against the left and right planes.

GPU culling would be best choise.

can dx9.0c and shader model 3 do that? that's what i'm limited to unless i want to port to dx10, dx11, or dx12....

I don't think the dx9 "geometry shader / tesselation stage" (?) is programmable.

In a recent game, we actually did a lot of "vertex shading" on the CPU and streamed over 100k 32byte verts per frame!

ideally, before culling, i'd have 810,000 potentially visible meshes, perhaps 24 verts each. typically, about 1/4 would be visible , say 200,000 to keep it round figures. so 200K * 16 floats per wordmat * 4 bytes per float * 60 fps = 768 Meg per sec. Hey - this might be doable! <g>.

and if i can calc a worldmat in a VS_3_0 shader, i can just send x,y,z,xr,yr,zr, instead of a whole world mat.

That method of send the whole array of worldmats, and a custom index buffer sounds interesting. Guess that's yet another thing to try. using that, the worldmat VB could be static.

Looks like i have some testing and timing work ahead of me...

Guess the first thing is to figure out if i can cull on the GPU with VS_3_0. a quick glance at google results seems to indicate "no".

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

unfortunately I have yet to see generated grass that looks nearly as good as sprite textured grass (IE textured qauds with textures from photos or photorealistic renderings). shadowing etc is quite impressive, but the blades themselves.... "eh - so so" at best. So it looks like GPU generated solutions won't cut it.


You seem to be super confused about what GPU rendering of grass actually is. Every technique I've seen used recently _is_ just using textured sprites for grass. They're using the GPU to generate the draw calls for those sprites procedurally from the terrain with LoD support, blending, wind animation, etc. The exact same stuff you'd do on the CPU, except much much much much faster.

Sean Middleditch – Game Systems Engineer – Join my team!

OK, looks like no programmable geometry shader stage in dx9. has to be dx10 or higher.

is there something i can do like emit a degenerate triangle from the vertex shader, that will force the pipeline to bailout early / earlier in attempting to draw that triangle? I haven't checked for sure, but it looks like VS_3_0 has the functions required to do a frustum cull. the question is what can i do with the results (if anything) to make the dx9 pipeline bail sooner?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

This topic is closed to new replies.

Advertisement