Fast grass rendering

Started by
16 comments, last by Bagel Man 18 years, 8 months ago
Hi, I'm trying to render grass so I was wondering how to do it really fast. First of all, the geometry. All grass objects are flat planes in my case so what's the fastest way to render 2 polygons? A while ago I asked about rendering large objects and the guys here told me about several approaches, like vertex-lists with indices and VBO's. I guess a VBO is the fastest way but does it also work for a very simple object like this grass plane? And what about indices, does that make the rendering also faster for such a simple object? Another pain in the #$ is alpha sorting I think. I guess the furthest transparent object needs to be rendering first, the closest last. So if I have a list with hundreds of objects, how to sort them really quick? Any tips? Or is there another way instead of sorting on distance? Greetings, Rick
Advertisement
Something you really want to do if you can is group your grass into as few rendering calls as possible. Your GPU is capable of rendering a huge amount of geometry, but not if you are only rendering two polygons at a time.

So, try to combine a large amount of your grass into a composite object that you can render in one or very few calls.

VBOs and whatnot will speed you up once you have a decent amount of geometry to send the GPU in one batch. :)

If you want to avoid dealing with alpha sorting you can use an alpha test instead. It doesn't look quite as good but it's not bad either.
Orin Tresnjak | Graphics ProgrammerBethesda Game StudiosStandard Disclaimer: My posts represent my opinions and not those of Bethesda/Zenimax, etc.
Hi spek,

you only wanna render 2 triangles??

don't know what api you're using but in opengl the fastest ways to render 2 (or even more) polys

are to use a display list (DL) for static objects,

or to use Vertex Buffer Objects (VBO) for static as well as dynamic meshes,

but if really only 2 polys should be rendered you can render them in immediate mode as well ;)

cheers
tgar

EDIT: DOH! lancekt, you were a little bit faster ;)
Quote:Original post by Thaligar
Hi spek,

you only wanna render 2 triangles??

don't know what api you're using but in opengl the fastest ways to render 2 (or even more) polys

are to use a display list (DL) for static objects,

or to use Vertex Buffer Objects (VBO) for static as well as dynamic meshes,

but if really only 2 polys should be rendered you can render them in immediate mode as well ;)

cheers
tgar

EDIT: DOH! lancekt, you were a little bit faster ;)


I think he means 2 tris per billboard assuming there will be lots of billboards in the scene. As the first poster mentioned, assuming all the billboards are static in geometry although an animated texture could be used to simulate wind, it is easy to just batch all of them into 1 or 2 vertex buffers.

Thanks guys!

The problem is that the grass objects change a lot. When moving the camera, lots of objects will become visible or invisible. However, if I want to do sorting I have a list anyway, I could put this whole set of planes into a large list.

But display lists or VBO's are pretty static right? If the list content changes continuously, I would need to remake such a list again and again. I never used VBO's before so I don't know if its a real problem but otherwise I might better use a dynamic approach,
glDrawElements or something
Right?

Another problem might be the rotations. The grass objects aren't sprites that rotate with you. Each grass object has its own matrix so before inserting points in such a list, we need to calculate the absolute vertex coordinates first. Maybe its not a big problem for the CPU but there should be much grass of course (hundreds, maybe thousands), that's quite some math if you ask me. I can't store the absolute points in the memory either, it would never fit as the entire terrain has millions of grass objects.

Thanks for helping!
Rick
Use procedural placement, where everything is done on the GPU. Your VBO would only contain a few thousand static zero area quads, ie. only the topology without any real extend or positional information. A vertex shader then generates the actual vertex positions according to either some mathematical equation (fractal, noise), or to some predefined placement maps (requires the HW capability to vertex shader texture accesses), or a combination of both.

The CPU should never touch an individual blade of grass, or even an individual grass patch. It should only modify generic parameters, such as grass density or attribute maps. Everything else is done on the graphics card and memory.
Interesting... If I understand you right (my english is not that good), I should do this:

1- Load a large list of quads into the memory (as a VBO), rotations, scales or positions don't matter for now.
2- When this set of quads is rendered, a vertex shaders looks into some data or math to retrieve a matrix for the current quad. So the list of quads is some sort of stack with available geometry.

A few questions about that if you don't mind:

If I want multiple kinds of foliage (for example, green grass and rotten brown grass), I guess I need multiple VBO's as well?

How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

How to put these matrices so that a shader can read them? I can't just put them somewhere based on noise or something, each plane requires a matrix defined by the 'map-builder'. Is it possible to send an array of matrices to the shader as a parameter? This array would change when the camera moves (or maybe when the depth-sorting changes).

The amount of visible grass can vary a lot. Some parts might have hundreds of grass planes, others just a few or zero. What to do if I only need to render a few quads while this VBO's tries to render thousands?

Greetings,
Rick
Quote:Original post by spek
1- Load a large list of quads into the memory (as a VBO), rotations, scales or positions don't matter for now.
2- When this set of quads is rendered, a vertex shaders looks into some data or math to retrieve a matrix for the current quad. So the list of quads is some sort of stack with available geometry.

Prestoring generic rectangles and generating the matrices on the fly is one way, right. A more flexible, yet more complex one is to actually generate the vertex positions themselves.

Quote:Original post by spek
If I want multiple kinds of foliage (for example, green grass and rotten brown grass), I guess I need multiple VBO's as well?

Nope. Just stream in the grass type as a vertex attribute. The vertex shader will the generate the appropriate texture coordinates into a texture atlas (basically just UV offsets). Alternatively, to reduce vertex shader math, you can directly stream in the UV coordinate offsets per vertex, which select the subtexture to apply.

Quote:Original post by spek
How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

That's tough. For now, I wouldn't worry about sorting, and use alpha test instead.

Quote:Original post by spek
How to put these matrices so that a shader can read them? I can't just put them somewhere based on noise or something, each plane requires a matrix defined by the 'map-builder'. Is it possible to send an array of matrices to the shader as a parameter? This array would change when the camera moves (or maybe when the depth-sorting changes).

You don't store the matrices, you build them up on the fly. You need to generate the reference coordinate system for each patch in the shader defined by its three major axes. Then, build the matrix from these 3 vectors - same princpiple as building a tangent space matrix from the TBN base vectors.

Quote:Original post by spek
The amount of visible grass can vary a lot. Some parts might have hundreds of grass planes, others just a few or zero. What to do if I only need to render a few quads while this VBO's tries to render thousands?

In practice, you'll have one VBO render call per terrain patch. Each VBO renders only as many grass patches as the parent terrain patch was assigned.

Edit, to clarify: of course, all terrain patches share the same common VBO in memory. Each terrain patch issues a VBO draw command that will render a part of the common VBO, just as much as required by the terrain patch. Before doing so, the individual placement parameters for each terrain patch (ie. fractal seeds, placement and distribution maps, etc) are loaded as shader parameters.
> How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

you could try using alpha to coverage (part of ARB_MULTISAMPLE in opengl), although that will increase fillrate costs.
You're full of tricks! That atlas thing is really usefull!

About building those matrices, you mean I should pass arrays with the positions, rotations, and maybe one with scales? I could use some texture channels for that of course. The VBO itself only stores the vertices and texcoords. So I still have to calculate those 2 or 3 array's on the CPU? Then I could give invisible quads a position outside the view frustum... I'm going to check this out, sounds fast to me!

Thanks for helping!
Rick

This topic is closed to new replies.

Advertisement