Sign in to follow this  
spek

Fast grass rendering

Recommended Posts

spek    1240
Hi, I'm trying to render grass so I was wondering how to do it really fast. First of all, the geometry. All grass objects are flat planes in my case so what's the fastest way to render 2 polygons? A while ago I asked about rendering large objects and the guys here told me about several approaches, like vertex-lists with indices and VBO's. I guess a VBO is the fastest way but does it also work for a very simple object like this grass plane? And what about indices, does that make the rendering also faster for such a simple object? Another pain in the #$ is alpha sorting I think. I guess the furthest transparent object needs to be rendering first, the closest last. So if I have a list with hundreds of objects, how to sort them really quick? Any tips? Or is there another way instead of sorting on distance? Greetings, Rick

Share this post


Link to post
Share on other sites
lancekt    348
Something you really want to do if you can is group your grass into as few rendering calls as possible. Your GPU is capable of rendering a huge amount of geometry, but not if you are only rendering two polygons at a time.

So, try to combine a large amount of your grass into a composite object that you can render in one or very few calls.

VBOs and whatnot will speed you up once you have a decent amount of geometry to send the GPU in one batch. :)

If you want to avoid dealing with alpha sorting you can use an alpha test instead. It doesn't look quite as good but it's not bad either.

Share this post


Link to post
Share on other sites
Thaligar    142
Hi spek,

you only wanna render 2 triangles??

don't know what api you're using but in opengl the fastest ways to render 2 (or even more) polys

are to use a display list (DL) for static objects,

or to use Vertex Buffer Objects (VBO) for static as well as dynamic meshes,

but if really only 2 polys should be rendered you can render them in immediate mode as well ;)

cheers
tgar

EDIT: DOH! lancekt, you were a little bit faster ;)

Share this post


Link to post
Share on other sites
Khaos Dragon    196
Quote:
Original post by Thaligar
Hi spek,

you only wanna render 2 triangles??

don't know what api you're using but in opengl the fastest ways to render 2 (or even more) polys

are to use a display list (DL) for static objects,

or to use Vertex Buffer Objects (VBO) for static as well as dynamic meshes,

but if really only 2 polys should be rendered you can render them in immediate mode as well ;)

cheers
tgar

EDIT: DOH! lancekt, you were a little bit faster ;)


I think he means 2 tris per billboard assuming there will be lots of billboards in the scene. As the first poster mentioned, assuming all the billboards are static in geometry although an animated texture could be used to simulate wind, it is easy to just batch all of them into 1 or 2 vertex buffers.

Share this post


Link to post
Share on other sites
spek    1240
Thanks guys!

The problem is that the grass objects change a lot. When moving the camera, lots of objects will become visible or invisible. However, if I want to do sorting I have a list anyway, I could put this whole set of planes into a large list.

But display lists or VBO's are pretty static right? If the list content changes continuously, I would need to remake such a list again and again. I never used VBO's before so I don't know if its a real problem but otherwise I might better use a dynamic approach,
glDrawElements or something
Right?

Another problem might be the rotations. The grass objects aren't sprites that rotate with you. Each grass object has its own matrix so before inserting points in such a list, we need to calculate the absolute vertex coordinates first. Maybe its not a big problem for the CPU but there should be much grass of course (hundreds, maybe thousands), that's quite some math if you ask me. I can't store the absolute points in the memory either, it would never fit as the entire terrain has millions of grass objects.

Thanks for helping!
Rick

Share this post


Link to post
Share on other sites
Yann L    1802
Use procedural placement, where everything is done on the GPU. Your VBO would only contain a few thousand static zero area quads, ie. only the topology without any real extend or positional information. A vertex shader then generates the actual vertex positions according to either some mathematical equation (fractal, noise), or to some predefined placement maps (requires the HW capability to vertex shader texture accesses), or a combination of both.

The CPU should never touch an individual blade of grass, or even an individual grass patch. It should only modify generic parameters, such as grass density or attribute maps. Everything else is done on the graphics card and memory.

Share this post


Link to post
Share on other sites
spek    1240
Interesting... If I understand you right (my english is not that good), I should do this:

1- Load a large list of quads into the memory (as a VBO), rotations, scales or positions don't matter for now.
2- When this set of quads is rendered, a vertex shaders looks into some data or math to retrieve a matrix for the current quad. So the list of quads is some sort of stack with available geometry.

A few questions about that if you don't mind:

If I want multiple kinds of foliage (for example, green grass and rotten brown grass), I guess I need multiple VBO's as well?

How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

How to put these matrices so that a shader can read them? I can't just put them somewhere based on noise or something, each plane requires a matrix defined by the 'map-builder'. Is it possible to send an array of matrices to the shader as a parameter? This array would change when the camera moves (or maybe when the depth-sorting changes).

The amount of visible grass can vary a lot. Some parts might have hundreds of grass planes, others just a few or zero. What to do if I only need to render a few quads while this VBO's tries to render thousands?

Greetings,
Rick

Share this post


Link to post
Share on other sites
Yann L    1802
Quote:
Original post by spek
1- Load a large list of quads into the memory (as a VBO), rotations, scales or positions don't matter for now.
2- When this set of quads is rendered, a vertex shaders looks into some data or math to retrieve a matrix for the current quad. So the list of quads is some sort of stack with available geometry.

Prestoring generic rectangles and generating the matrices on the fly is one way, right. A more flexible, yet more complex one is to actually generate the vertex positions themselves.

Quote:
Original post by spek
If I want multiple kinds of foliage (for example, green grass and rotten brown grass), I guess I need multiple VBO's as well?

Nope. Just stream in the grass type as a vertex attribute. The vertex shader will the generate the appropriate texture coordinates into a texture atlas (basically just UV offsets). Alternatively, to reduce vertex shader math, you can directly stream in the UV coordinate offsets per vertex, which select the subtexture to apply.

Quote:
Original post by spek
How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

That's tough. For now, I wouldn't worry about sorting, and use alpha test instead.

Quote:
Original post by spek
How to put these matrices so that a shader can read them? I can't just put them somewhere based on noise or something, each plane requires a matrix defined by the 'map-builder'. Is it possible to send an array of matrices to the shader as a parameter? This array would change when the camera moves (or maybe when the depth-sorting changes).

You don't store the matrices, you build them up on the fly. You need to generate the reference coordinate system for each patch in the shader defined by its three major axes. Then, build the matrix from these 3 vectors - same princpiple as building a tangent space matrix from the TBN base vectors.

Quote:
Original post by spek
The amount of visible grass can vary a lot. Some parts might have hundreds of grass planes, others just a few or zero. What to do if I only need to render a few quads while this VBO's tries to render thousands?

In practice, you'll have one VBO render call per terrain patch. Each VBO renders only as many grass patches as the parent terrain patch was assigned.

Edit, to clarify: of course, all terrain patches share the same common VBO in memory. Each terrain patch issues a VBO draw command that will render a part of the common VBO, just as much as required by the terrain patch. Before doing so, the individual placement parameters for each terrain patch (ie. fractal seeds, placement and distribution maps, etc) are loaded as shader parameters.

Share this post


Link to post
Share on other sites
l0calh05t    1796
> How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

you could try using alpha to coverage (part of ARB_MULTISAMPLE in opengl), although that will increase fillrate costs.

Share this post


Link to post
Share on other sites
spek    1240
You're full of tricks! That atlas thing is really usefull!

About building those matrices, you mean I should pass arrays with the positions, rotations, and maybe one with scales? I could use some texture channels for that of course. The VBO itself only stores the vertices and texcoords. So I still have to calculate those 2 or 3 array's on the CPU? Then I could give invisible quads a position outside the view frustum... I'm going to check this out, sounds fast to me!

Thanks for helping!
Rick

Share this post


Link to post
Share on other sites
Yann L    1802
Quote:
Original post by spek
About building those matrices, you mean I should pass arrays with the positions, rotations, and maybe one with scales?

That would mean to keep a dedicated VBO per terrain patch. While this will work (and doesn't require CPU interference), it will take more memory and is less flexible.

If you have SM 3.0 capable hardware, then one could build it like this:

* Create a single generic VBO, that contains a number of origin centered squares, all aligned along the z axis. In addition to the vertex positions, each vertex gets a vertex attribute that assigns it to a patch ID. So, all four vertices forming a quad would get the same ID. This is so that the vertex shader can see to what patch a vertex belongs to.

* Create a patch attribute texture map: a group of texels represents one patch, and contain data about it's position, rotation and scale, as well as type (index into a texture atlas), and several other optional parameters.

* When rendering a terrain patch set, bind the corresponding grass patch attribute map so that you can access it in a vertex shader. Then, render the shared VBO, with as many quads as you need for the current patch.

* The vertex shader will use the patch ID index from the vertex to address the right attribute texels in the attribute map. It will then read back the patch attributes, and use them (combined with the current camera matrix) to form an billboard alignment matrix. Each pregenerated standard quad vertex gets transformed by this matrix. Finally, additional vertex parameters, such as texcoords, colours, etc, are also generated using the patch attributes.

Of course, depending on how smart you compress the attributes, this can take a few texture reads in the VS. And such reads are not (yet) as fast as one would like them to be. On the other hand, the fact that vertex attributes are stored as textures opens a whole new world for animation: you can use the GPU to modify the map. Using pixel shader tricks, you can animate the grass patches without ever needing the CPU (by rendering the original attribute map to a second one, and swapping them - think double buffering).

Share this post


Link to post
Share on other sites
spek    1240
I don't know what SM 3 exactly is... I have a GeForce 5700 ultra at the moment.

The attribute map, could that be a 1D or 2D image with for example,
pixel1.xyz=position
pixel2.xyz=rotation
pixel1.w=material ID
<and maybe more stuff like width/height or something>

As far as I know, vertex shaders on my card can't read images, only the pixel shaders (is that what you mean with SM3?). But even if my card could do that, it would need the CPU in my case as well I think. Not every frame but once the the camera moves and new grass comes in sight (or our of sight), this map has to be recreated and uploaded to the video-card I think. Or is this not what you mean?

Anyway, I think passing data like position with something like glDrawArray in combination with this VBO would already be a big improvement. By the way, would the usage of indices improve the rendering speed in this case?

Thanks!
Rick

Share this post


Link to post
Share on other sites
Rattrap    3385
I think it may only be supported on some of the newer graphics cards and I don't know what the technology is called, but I remember watching the premiere of the Geforce 6800s. The speaker talked about some kind of geometry technology that did fast rendering of multiple instances of the same object. I think one of the examples was an astroid field.

Share this post


Link to post
Share on other sites
toucel    188

sm3 = shader model 3

it is only supported on newer cards (some functionality needing beta drivers to function appropriately, or close to appropriately)

I dont believe your card (5700) supports sm3 - it looks like your solution will have to differ from some of the suggestions for implementation

Share this post


Link to post
Share on other sites
Yann L    1802
Quote:
Original post by spek
As far as I know, vertex shaders on my card can't read images, only the pixel shaders (is that what you mean with SM3?). But even if my card could do that, it would need the CPU in my case as well I think. Not every frame but once the the camera moves and new grass comes in sight (or our of sight), this map has to be recreated and uploaded to the video-card I think. Or is this not what you mean?

You don't need to touch the map while rendering at all (unless you want to animate the grass). Each terrain tile has its own map (or submap), that remains static. The grass system has to be integrated into your terrain engine, as it will share the visibility culling with the terrain tiles. Geomipmapping would be perfect. Visibility is determined hierarchically for each terrain tile, using frustum culling, occlusion culling, whatever. If a tile is visible, all grass on it is also assumed visible. A few patches will obviously be outside the view, but its going to be faster to let the GPU cull those away, than fiddling around with data in VRAM using the CPU.

But this approach will not work on your GF 5700, since it doesn't support vertex texture access. Still, you can use the "one VBO per tile" approach, streaming in the data over vertex streams. The results will be the same, although it will take more memory. You could even animate in on the GPU, by using the render-to-vertex-array feature (which is supported by your chipset).

Quote:
Original post by spek
Anyway, I think passing data like position with something like glDrawArray in combination with this VBO would already be a big improvement. By the way, would the usage of indices improve the rendering speed in this case?

Most probably. In your specific case the speedup will be less pronounced than in common rendering applications (where glDrawArrays should be avoided like the plague), because you share considerably less vertices. But it would still be advisable to switch to indexed VBOs.

Share this post


Link to post
Share on other sites
Monder    993
Quote:
Original post by Yann L
Quote:
Original post by spek
How to sort on depth on the GPU to avoid Z-problems (in case I don't want to use alpha-test)?

That's tough. For now, I wouldn't worry about sorting, and use alpha test instead.


Out of interest how would you sort them or at least have alpha blending work correctly without needing a Z-sort?

There is one way I can think of (well know of) but to do it you'd need to hold the positions of all the grass blades in a texture which you update over many passes which perform the sort.

Share this post


Link to post
Share on other sites
Bagel Man    122
I've found that when rendering a large number of semi-transparent overlapping objects (grass in particular) it can help to render them last and just turn off z-buffer writes. Even intersecting planes seem to look acceptable in most cases. Not sure if it would work for your case though.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this