Drawing many enemies

Started by
6 comments, last by bwhiting 9 years, 4 months ago
Hi,

I am coding right now a 2D game using OpenGL ES 2.0
(I used OpenGL 1, 3 and 4 before)

In my game I want the player to defend his own build base against a horde of enemies, also known as Tower Defence.

However my question is now how can I draw a lot of (as much as possible) textured 2D quads (enemies) on the screen?

When profiling I see that every OGL call is really expensive which makes it very difficult to draw a lot of stuff.

Are there any tricks to do this? Are there any other OGL calls I should know of?

My attempts:
1.) Finding any other hardware functions to draw something despite calling for every object
"draw this vao, with this matrix and textures"
Result: I have found "glMultiDraw()" but Internet says that it is completly useless, is this correct?

2. Decrease number of OGL calls by setting the texture, shader, vao and other uniforms once and then call for each enemy the set-matrix and draw-this-call.
Result: Huge success, still I am lacking of performance

I already searched OGL tips and tricks, though I found nothing useful.

Any help is much appreciated!
Advertisement
Maybe consider [ this ] gamedev article on instancing.
You'd have to move to a minimum of ogl3.1 though.

Found http://carloscarrasco.com/faking-mesh-instancing-in-opengl-es-20.html thought that might spark something.

I did a demo here of upto 100,000 animated sprites that can run in a flash enabled browser providing you have a machine decent enough to handle it.

http://blog.bwhiting.co.uk/?p=476

The source can be seen here:

https://github.com/bwhiting/b2dLite

It uses indirect addressing in the shader to batch as many draws into one draw call as possible.

The technique means you don't have to modify vertex buffers every frame, but rather offset everything with shader constants.

You can do it the other way too, by just modifying vertex buffers on the cpu and re-uploading them and it will enable you to batch much greater numbers together than the method I used but each has it pros and cons.

On many devices I have found the approach I used to perform better anyway, and it also scales better with more complex meshes, as you have less data to update on the CPU every frame (same for 4 verts as for 40 or even 400). But you are limited by how many vertex constants you can upload.

I already searched OGL tips and tricks, though I found nothing useful.

Isn't there anything mentioned about "batching"?

2. Decrease number of OGL calls by setting the texture, shader, vao and other uniforms once and then call for each enemy the set-matrix and draw-this-call.

From this description it isn't clear what you do exactly. Let us assume that you set a matrix as uniform variable, hence pushing presumably 3*4*4=48 bytes to the GPU. Now, using indexed triangles, a quad with texture co-ordinates may consume 4*(2*4+2*2)=48 bytes (the indices can be applied in static draw mode). So going the "old school" way of batching allows for a single call to push the dynamic VBO (which itself can be done in a way avoiding the one or other synchronization) and a single call to draw, without causing more traffic (in fact, the traffic will be less due to data concentration). The only thing is that the CPU needs to perform the transformation.

This is one attempt that works on ES 2.0 hardware, too. However, there may be further optimizations.

I recently posted an article here: http://www.gamedev.net/page/resources/_/technical/opengl/opengl-batch-rendering-r3900 that should help you out. If you Batch your quads in the way that I show in the article you'll minimize the number of draw calls.

Maybe consider [ this ] gamedev article on instancing.

You'd have to move to a minimum of ogl3.1 though.

Found http://carloscarrasco.com/faking-mesh-instancing-in-opengl-es-20.html though that might spark something.

Thanks. Okay so OGL3 is minimum required. That is sad as my device doesn't really support OGL 3 "fully".

Your linked article to overcome this limitation is not very satisfying, it seems a little bit rough and wasteful like those guys did, even though I 'accidentially' used this method without knowing it to draw my tiled environment with 16,386,304 (64x64pixel) tiles on the screen (sadly limited by CPU RAM).

However an environment is static and has a fixed size so it allows a much more easier and way more effective optimization, I can't imagine this technique to work for dynamic (in count and position) objects as well.

I guess I am going to try that out how well it works and see for myself smile.png

Thank you smile.png

I did a demo here of upto 100,000 animated sprites that can run in a flash enabled browser providing you have a machine decent enough to handle it.

http://blog.bwhiting.co.uk/?p=476

The source can be seen here:

https://github.com/bwhiting/b2dLite

It uses indirect addressing in the shader to batch as many draws into one draw call as possible.

The technique means you don't have to modify vertex buffers every frame, but rather offset everything with shader constants.

You can do it the other way too, by just modifying vertex buffers on the cpu and re-uploading them and it will enable you to batch much greater numbers together than the method I used but each has it pros and cons.

On many devices I have found the approach I used to perform better anyway, and it also scales better with more complex meshes, as you have less data to update on the CPU every frame (same for 4 verts as for 40 or even 400). But you are limited by how many vertex constants you can upload.

I have problems reading the code as I am not used to flash (though it is very beautiful! Good commented!) but I sadly do not understand how it exactly works.

I see the renderQuad function (which is for some reasons not called anywhere in the code) but I guess it is the main function to draw a quad.

First you are setting the texture

Then manual frustum culling

And at the end you are creating a quad and send it to your shader...

To be honest I don't see where this "draw 100.000" of objects happens in your code.

Your demo is really "wow", I got 60.000 running people before it started to drop.

Maybe I am just overlooked the obvious but as I said I am not used to flash, I don't see the start or the end of your program biggrin.png

Thank you smile.png

I already searched OGL tips and tricks, though I found nothing useful.

Isn't there anything mentioned about "batching"?

2. Decrease number of OGL calls by setting the texture, shader, vao and other uniforms once and then call for each enemy the set-matrix and draw-this-call.

From this description it isn't clear what you do exactly. Let us assume that you set a matrix as uniform variable, hence pushing presumably 3*4*4=48 bytes to the GPU. Now, using indexed triangles, a quad with texture co-ordinates may consume 4*(2*4+2*2)=48 bytes (the indices can be applied in static draw mode). So going the "old school" way of batching allows for a single call to push the dynamic VBO (which itself can be done in a way avoiding the one or other synchronization) and a single call to draw, without causing more traffic (in fact, the traffic will be less due to data concentration). The only thing is that the CPU needs to perform the transformation.

This is one attempt that works on ES 2.0 hardware, too. However, there may be further optimizations.

Nope, nothing mentioned about batching, though I have read that article found by accident searching for something else.

The tips I have found were mostly "try to reduce number of draw-calls" and "first compute your data then call your draw calls" - basic stuff like that.

Well before I did this:

Each frame:

- Each object:

- - set Shader // I only use one small shader, so I didn't actually do this

- - set Uniforms

- - set Matrix

- - set Texture

- - set Vertices

- - glDrawElements

Now I am doing this:

Each frame:

- Each draw group:

- - set Shader

- - set Uniforms

- - set Textures

- - set Vertices

- - Each object:

- - - set Matrix

- - - glDrawElements

This gave me a really huge performance boost.

So now for your batching tip: I think this is the same tip already given by Goliath Forge (first reply). When I understood correctly I shall pack all (or at least "a lot") of my VBOs in one single VBO and then tell OpenGL to draw them, yes?

So if I want to draw 100 textured quads, I pack 100 times the quad vertices and texture coordinates in one buffer and then draw this one single VBO... ?

Please correct me if I misunderstood.

Thank you smile.png

I recently posted an article here: http://www.gamedev.net/page/resources/_/technical/opengl/opengl-batch-rendering-r3900 that should help you out. If you Batch your quads in the way that I show in the article you'll minimize the number of draw calls.

Your batching seems to differ from the batching of haegarr above you.

I have read your article before with interest however OpenGL ES doesn't support VAOs

(I accidentially mixed the terms VBO and VAO in my first post, I tend to forget which is what)

and so for I can't use your article.

Thank you smile.png

the batch manager presented in the article supports OpenGL v2 and up. If your hardware doesn't support v3.x that is fine, the batch manager skips the VAO stuff and still works seamlessly.

The batching that is done in the manager is the same that is described by haegarr.

@AppropriateUserName
Sorry yes that source is for the renderer lib only not the code for the demo that uses it.

The demo just essentially calls the renderQuad function a bunch of times with different params.

I will try and explain how it works to hopefully clear things up.

Step one is to create your buffers, you only have to do this once.

To do this you must first work out the max number of quads you can draw in one batch.

This will be number of vertex shader registers divided by the number of registers that will be used per quad.

i.e. 128/2 = 64

(128 registers available and 2 required to store the x,y,width,height,texture_x,texture_y,texture_scaleX, texture_scaleY per quad)

gives us 64 quads per batch (draw call)

Shader model 5 has 4096 available registers so that would give you 2048 quads per batch as long as you only used 2 registers per quad but you get the idea.

You could use a 3rd register to store an alpha, or a rotation or a tint... anyway.

Ok now we have that number we can prepare the list of ids, 0,1,2,3,2,1..... until we have described x number of quads.

Then we prepare the uvs and vertices in the same fashion but verts all normalized (-1/1) and uvs (0/1) this way the can easily be scaled in the shader based on the values fed into the register.

The final piece is the id buffer where the vertices in each quad share a unique id, in our example the id of the first quad will be 0, the next one 4 and so on, incrementing by the register usage per quad.

This id is then used as the lookup to get the data out of the registers via indirect addressing, so once you have got a hold of the id you look up the scale and offsets for the position and uvs, apply it and boom you are ready to go.

So all the lib does is read in values from the user and put them into a register that gets sent to the gpu once it is full or the texture is changed and it is able to then draw a much higher number of quads in one draw call with very little overhead.

Hope that makes some sense, sorry if not as I wrote it in a rush but might give you an idea anyway! Best of luck!

Also glad you enjoyed the flash demo... still never sure why people are so confused by what it is capable of.. its a pretty cool tech if you know how to wield it.

This topic is closed to new replies.

Advertisement